cc Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

Basic criteria for data presentation

If you’re at all interested in this topic, the talk by John Rauser (2016) (here) is strongly recommended.

Visual perception of quantitative information: Cleveland hierarchy (Cleveland and McGill 1984, @cleveland_graphical_1987, @cleveland_visualizing_1993)

cleveland

Techniques for multilevel data

ggplot2 makes it fairly easy to do a simple two-stage analysis on the fly using geom_smooth, e.g. with the CBPP data discussed below:

ggplot

Rules of thumb

ggplot intro

load("../../data/gopherdat2.RData")
library("ggplot2"); theme_set(theme_bw())
(ggplot(Gdat,aes(x=year,y=shells/Area,colour=Site))
    + geom_point()
)

See Karthik Ram’s ggplot intro or my intro for disease ecologists, among many others.

Multilevel data examples

library("ggalt")
source("../../R/geom_cstar.R")

time series: cbpp data set

Contagious bovine pleuropneumonia (CBPP): from Lesnoff et al. (2004), via the lme4 package. See ?lme4::cbpp for details.

data("cbpp",package="lme4")
## make period *numeric* so lines will be connected/grouping won't happen
cbpp2 <- transform(cbpp,period=as.numeric(as.character(period)))
g0 <- ggplot(cbpp2,aes(period,incidence/size)) ## plot template (no geom)

spaghetti plot

g1 <- (g0
    +geom_line(aes(colour=herd))
    +geom_point(aes(size=size,colour=herd))
)

Do we need the colours?

g2 <- (g0
    +geom_line(aes(group=herd))
    +geom_point(aes(size=size,group=herd))
)

Facet instead:

g4 <- g1+facet_wrap(~herd)

Order by average prop. incidence, using the %+% trick:

cbpp2R <- transform(cbpp2,herd=reorder(herd,incidence/size))
g4 %+% cbpp2R

two-stage analysis:

(g0
    + geom_point(aes(size=size,group=herd))
    + geom_smooth(aes(group=herd,weight=size),
                  method="glm",
                  method.args=list(family=binomial),
                  se=FALSE))
## `geom_smooth()` using formula = 'y ~ x'

(ignore glm.fit warnings if you try this)

scatterplots: gopher tortoise mycoplasma data

Gopher tortoise data (from Ozgul et al. (2009), see ecostats chapter)

Plot density of shells from freshly dead tortoises (shells/Area) as a function of mycoplasmal prevalence (%, prev): you may want to consider site, year of collection, or population density as well.

load("../../data/gopherdat2.RData")
g5 <- ggplot(Gdat,aes(prev,shells/Area))+geom_point()
g5+geom_encircle(aes(group=Site))
g5+geom_encircle(aes(group=Site),s_shape=1,expand=0) ## convex hulls
## connect points to center
g5+stat_centseg(aes(group=Site),cfun=mean)

treatment comparisons: clipping data

Data from Banta, Stevens, and Pigliucci (2010):

Easier if there is one data point per group (connect with lines), but

load("../../data/Banta.RData")
## dat.tf$ltf1 <- log(dat.tf$total.fruits+1)
g6 <- ggplot(dat.tf,aes(nutrient,total.fruits,colour=gen))+
    geom_point()+
    scale_y_continuous(trans="log1p")+
    facet_wrap(~amd)+
    stat_summary(fun.y=mean,aes(group=interaction(popu,gen)),
                 geom="line")
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

If stat_summary is used with fun.data=, it can also compute confidence intervals. Try "mean_cl_boot" or "mean_cl_normal" (see ?mean_cl_boot)

Dynamic graphics:

library(plotly)
ggplotly(g6)

exercise

Pick a data set from the list available on the web page (or use your own) and create two plots that indicate the grouping in different ways.

References

Banta, Joshua A., Martin H. H. Stevens, and Massimo Pigliucci. 2010. “A Comprehensive Test of the ’Limiting Resources’ Framework Applied to Plant Tolerance to Apical Meristem Damage.” Oikos 119 (2): 359–69. https://doi.org/10.1111/j.1600-0706.2009.17726.x.

Cleveland, William. 1993. Visualizing Data. Summit, NJ: Hobart Press.

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. https://doi.org/10.2307/2288400.

———. 1987. “Graphical Perception: The Visual Decoding of Quantitative Information on Graphical Displays of Data.” Journal of the Royal Statistical Society. Series A (General) 150 (3): 192–229. https://doi.org/10.2307/2981473.

John Rauser. 2016. “How Humans See Data.” https://www.youtube.com/watch?v=fSgEeI2Xpdc.

Lesnoff, Matthieu, Géraud Laval, Pascal Bonnet, Sintayehu Abdicho, Asseguid Workalemahu, Daniel Kifle, Armelle Peyraud, Renaud Lancelot, and François Thiaucourt. 2004. “Within-Herd Spread of Contagious Bovine Pleuropneumonia in Ethiopian Highlands.” Preventive Veterinary Medicine 64 (1): 27–40. https://doi.org/10.1016/j.prevetmed.2004.03.005.

Ozgul, Arpat, Madan K Oli, Benjamin M Bolker, and Carolina Perez-Heydrich. 2009. “Upper Respiratory Tract Disease, Force of Infection, and Effects on Survival of Gopher Tortoises.” Ecological Applications 19 (3): 786–98. http://www.ncbi.nlm.nih.gov/pubmed/19425439.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 2nd Printing. Springer.

Wilkinson, L. 1999. The Grammar of Graphics. New York: Springer.