ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy, color = class))
To facet your plot by a single variable, use facet_wrap(). The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable name (here “formula” is the name of a data structure in R, not a synonym for “equation”). The variable that you pass to facet_wrap() should be discrete:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
Facet plot on the combination of two variables
# MPG data,
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color=class)) +
facet_grid(drv ~ cyl)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ cty, nrow = 2)
2. What do the empty cells in a plot with facet_grid(drv ~ cyl) mean? How do they relate to this plot?
ggplot(data = mpg) +
geom_point(mapping = aes(x = drv, y = cyl))
# left
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
# right
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
geom_smooth() will draw a different line, with a different linetype, for each unique value of the variable that you map to linetype:
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
In practice, ggplot2 will automatically group the data for these geoms whenever you map an aesthetic to a discrete variable (as in the linetype example). It is convenient to rely on this feature because the group aesthetic by itself does not add a legend or distinguishing features to the geoms:
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv))
ggplot(data = mpg) +
geom_smooth(
mapping = aes(x = displ, y = hwy, color = drv),
show.legend = TRUE
)
To display multiple geoms in the same plot, add multiple geom functions to ggplot()
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
You can pass a set of mappings to ggplot() to avoid duplicate variables
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
You can pass additional mappings to a geom function. This makes it possible to display different aesthetics in different layers.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
Here, our smooth line displays just a subset of the mpg dataset, the subcompact cars.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(
data = filter(mpg, class == "subcompact"),
se = FALSE
)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_smooth()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_boxplot(aes(group=cyl))
ggplot(data = mpg, mapping = aes(x = hwy)) +
geom_histogram(bins=10)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_area()
ggplot(
data = mpg,
mapping = aes(x = displ, y = hwy, color = drv)
) +
geom_point() +
geom_smooth(se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot() +
geom_point(
data = mpg,
mapping = aes(x = displ, y = hwy)
) +
geom_smooth(
data = mpg,
mapping = aes(x = displ, y = hwy)
)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv), se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
geom_smooth(mapping = aes(x = displ, y = hwy, group = drv), se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
geom_smooth(se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv)) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv), se = FALSE)
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(x = displ, y = hwy, color = drv))
The algorithm used to calculate new values for a graph is called a stat, short for statistical transformation. - Bar charts, histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall in each bin. - Smoothers fit a model to your data and then plot predictions from the model. - Boxplots compute a robust summary of the distribution and display a specially formatted box.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
You can use geoms and stats interchangeably. This works because every geom has a default stat, and every stat has a default geom.
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
You can display by proportion rather than count
ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, y = ..prop.., group = 1)
)
stat_summary()
summarizes y
values for each unique x
value
ggplot(data = diamonds) +
stat_summary(
mapping = aes(x = cut, y = depth),
fun.ymin = min,
fun.ymax = max,
fun.y = median
)
ggplot(data = diamonds) +
geom_pointrange(
mapping = aes(x = cut, y = depth),
fun.min = min,
fun.max = max,
fun = median,
stat = "summary"
)
demo <- tribble(
~a, ~b,
"bar_1", 20,
"bar_2", 30,
"bar_3", 40
)
ggplot(data=demo) +
geom_col(mapping = aes(x=a, y=b))
geom | stat |
---|---|
geom_bar() | stat_count() |
geom_bin2d() | stat_bin_2d() |
geom_boxplot() | stat_boxplot() |
geom_contour_filled() | stat_contour_filled() |
geom_contour() | stat_contour() |
geom_count() | stat_sum() |
geom_density_2d() | stat_density_2d() |
geom_density() | stat_density() |
geom_dotplot() | stat_bindot() |
geom_function() | stat_function() |
geom_sf() | stat_sf() |
geom_sf() | stat_sf() |
geom_smooth() | stat_smooth() |
geom_violin() | stat_ydensity() |
geom_hex() | stat_bin_hex() |
geom_qq_line() | stat_qq_line() |
geom_qq() | stat_qq() |
geom_quantile() | stat_quantile() |
group
is required as the geom_bar() assumes all groups are equal to the x
values since stat computes the counts within the group. To get proportions, you need to pass the group to split out the stacked bar chart.You can color a bar chart using color
aesthetic, or fill
. Adding a categorical variable to y with fill
with show a stacked bar for each.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, color = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
Stacking is performed automatically by position adjustment specified by the position
argument. If you don’t want a stacked bar chart, you can use one of three other options: identity
, dodge
or fill
:
position = identity
places each object exactly where it falls in the context of the graph. Not useful for bars, because overlap. Need to set alpha
to show each. More useful for scatter plots.ggplot(
data = diamonds,
mapping = aes(x = cut, fill = clarity)
) +
geom_bar(alpha = 1/5, position = "identity")
2. position = "fill"
works lik stacking, but makes each set of stacked bar the same height.
ggplot(data=diamonds) +
geom_bar(mapping=aes(x=cut, fill=clarity), position="fill")
3. position = "dodge"
places overlapping objects directly beside one another. Easier to compare individual values.
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position="dodge")
4. position = "jitter"
is useful for scatterplots. It adds random jitter to each plot so that we can see all data points without overlap.
ggplot(data = mpg) +
geom_point(mapping = aes(x=displ, y=hwy, color=class), position="jitter")
#### Exercises
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point()
- ANSWER: It is gridded, needs jitter:
ggplot(data = mpg, mapping = aes(x = cty, y = hwy, color=class)) +
geom_point(position="jitter")
### Coordinate Systems
Default coordinate is cartesian system, but there are others.
coord_flip()
: Switches x-y axes.ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
2. coord_quickmap()
: Sets correct aspect ratio for maps.
nz <- map_data("nz")
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", color = "black")
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", color = "black") +
coord_quickmap() +
labs(title="Map of New Zealand", x="x coord", y="y coord")
coord_polar()
: Uses polar coordinates. The labs
function adds axis titles, plot titles, and a caption to the plot.bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
ggplot(data = mpg, mapping = aes(x = cty, y = hwy)) +
geom_point() +
geom_abline() +
coord_fixed()
- ANSWER: coord_fixed ensures that the line produced by geom_abline()
is at a 45-degree angle. Studies have shown that humans perceives differences in angles relative to 45 degrees.
A template of all the plotting elements. The seven parameters in the template compose the grammar of graphics, a formal system for building plots.
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(
mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>
tinytex::install_tinytex(repository = "http://mirrors.tuna.tsinghua.edu.cn/CTAN/", version = "latest")