This document is an introduction to plotting using ggplot. ggplot is powerfull tool for plotting basically anything you could think of and it allows you to modify all aspects of your plot, which makes ggplot much more flexible than the plot function in base R. I hope that this document can help you to get an understanding of how ggplots are build. This should help you understand why problems occur when plotting and help you solve them.

This document includes two parts; The first part is a short introduction to the layers of a ggplot and their functionalities. The second part contains suggestions of additional packages, that can help you improve your graphic work. Examples are provided with code and graphs throughout the document.

1 The layers of ggplot

A ggplot consists of different layers that can be combined to a graph. The differnt layers will add an extra layer of information to the plot. You can (in most cases) add several layers of the same type, e.g. two different geometries can be added and will create one plot with two types of graphs inside. In the following the different layers and their content is discribed breifely and examples will follow afterwards.

  • Data: The data input for your plot can be in differnt formats, either as a full dataset, where each row is an observation or as summerized data, e.g. counts or percentages for different groups.
  • Mapping of data: The mapping of data is defined using the function aes(). It is used to describe which part of your data input that is used for what e.g. which variable is your x and y axis.
  • Geometries: The geometry is used to define how the data should be interpreted, e.g. as point, lines or bars.
  • Statistics: Statistics (stats) can be used for transformation of the original data into other measures e.g. from count to percentages. Stats are commonly used when the data input does not include the format you wish to plot, e.g. if you plot directly from a dataset where group percentages are not pre-calculated.
  • Scales: Scales imply a specific interpretation of values as e.g. continuous or discrete. It can be used to quite many puposes incluing translation between categories and colors and defining axis and limits.
  • Coordinates: Coordinates defines the physical mapping of the data. It can be used to modify which parts of the plot should be plottet and is very usefull for geographic plots.
  • Facets: This features can be used to split a plot by other variables, e.g. for analysing subgroups.
  • Theme: Themes can be used to modify anything that is unrelated to your data including labels, fonts, text size, background colors etc.

1.1 Data and mapping of data

Data is most often defined in ggplot(), but the mapping of data using aes() can be written under both ggplot() and the geometries. If all geometries are using the same mapping of data, then it does not matter if aes() is defined in ggplot() or under the geometry. If they use differnt mappings, aes() will have to be defined under each geometry.

Here a plot with two geometries are shown with the two different codes - they result in the same plot. If the different geometries were relying on differnt data sources, then data must be defined under each geometry, you can see an example here: Maps

#A
cloud1<-ggplot(dat,aes(x = age, y = bmi))+
  geom_point(alpha=0.3)+
  geom_density_2d()
#B
cloud2<-ggplot(dat)+  
  geom_point(aes(x = age, y = bmi), alpha=0.3)+
  geom_density_2d(aes(x = age, y = bmi))