cc Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

Goals/contexts of data visualization

Exploration

Diagnostics

Presentation

Basic criteria for data presentation

Much of what I have to say here is also said very nicely by John Rauser (2016)

Visual perception of quantitative information: Cleveland hierarchy (W. S. Cleveland and McGill 1984,W. S. Cleveland and McGill (1987),W. Cleveland (1993))

cleveland

cleveland

Data presentation scales with data size

Rules of thumb

Techniques for multilevel data

data("cbpp",package="lme4")
## make period *numeric* so lines will be connected/grouping won't happen
cbpp2 <- transform(cbpp,period=as.numeric(as.character(period)))
g0 <- ggplot(cbpp2,aes(period,incidence/size))
## spaghetti plot
g1 <- g0+geom_line(aes(colour=herd))+geom_point(aes(size=size,colour=herd))
g2 <- ggplot(cbpp2,aes(period,incidence/size,colour=herd))
(g3 <- g2 + geom_line()+geom_point(aes(size=size)))
## facet instead
(g4 <- g1+facet_wrap(~herd))
## order by average prop. incidence
g1 %+% transform(cbpp2,herd=reorder(herd,incidence/size))
g4 %+% transform(cbpp2,herd=reorder(herd,incidence/size))
## also consider colouring by incidence/order ...

Makes it fairly easy to do a simple two-stage analysis on the fly:

g0+geom_point(aes(size=size,colour=herd))+
    geom_smooth(aes(colour=herd,weight=size),
                method="glm",
                method.args=list(family=binomial),
                se=FALSE)

(ignore glm.fit warnings if you try this)

Challenges

high-dimensional data (esp continuous)

Possible solutions:

large data sets

discrete data

spatial data

compositional data

multilevel data

next generation tools

Graphics culture

Data visualization in R

Base graphics

ggplot

ggplot intro

mappings + geoms

Data

Specified explicitly as part of a ggplot call:

library(mlmRev)
head(Oxboys)
##   Subject     age height Occasion
## 1       1 -1.0000  140.5        1
## 2       1 -0.7479  143.4        2
## 3       1 -0.4630  144.8        3
## 4       1 -0.1643  147.1        4
## 5       1 -0.0027  147.7        5
## 6       1  0.2466  150.2        6
library(ggplot2)
ggplot(Oxboys)

But that isn’t quite enough: we need to specify a mapping between variables (columns in the data set) and aesthetics (elements of the graphical display: x-location, y-location, colour, size, shape …)

ggplot(Oxboys,aes(x=age,y=height))

but (as you can see) that’s still not quite enough. We need to specify some geometric objects (called geoms) such as points, lines, etc., that will embody these aesthetics. The weirdest thing about ggplot syntax is that these geoms get added to the existing ggplot object that specifies the data and aesthetics; unless you explicitly specify other aesthetics, they are inherited from the initial ggplot call.

ggplot(Oxboys,aes(x=age,y=height))+geom_point()

See Karthik Ram’s ggplot intro or my intro for disease ecologists, among many others.

sessionInfo()
## R Under development (unstable) (2018-04-16 r74611)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.4 LTS
## 
## Matrix products: default
## BLAS: /usr/local/lib/R/lib/libRblas.so
## LAPACK: /usr/local/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_CA.UTF8       LC_NUMERIC=C             
##  [3] LC_TIME=en_CA.UTF8        LC_COLLATE=en_CA.UTF8    
##  [5] LC_MONETARY=en_CA.UTF8    LC_MESSAGES=en_CA.UTF8   
##  [7] LC_PAPER=en_CA.UTF8       LC_NAME=C                
##  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
## [11] LC_MEASUREMENT=en_CA.UTF8 LC_IDENTIFICATION=C      
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] mlmRev_1.0-6       lme4_1.1-17        Matrix_1.2-14     
## [4] ggplot2_2.2.1.9000 knitr_1.20        
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16      magrittr_1.5      MASS_7.3-49      
##  [4] splines_3.6.0     munsell_0.4.3     lattice_0.20-35  
##  [7] colorspace_1.3-2  rlang_0.2.0.9001  minqa_1.2.4      
## [10] stringr_1.3.0     plyr_1.8.4        tools_3.6.0      
## [13] grid_3.6.0        nlme_3.1-137      gtable_0.2.0     
## [16] withr_2.1.2       htmltools_0.3.6   yaml_2.1.18      
## [19] lazyeval_0.2.1    rprojroot_1.3-2   digest_0.6.15    
## [22] tibble_1.4.2      nloptr_1.0.4      evaluate_0.10.1  
## [25] rmarkdown_1.9     labeling_0.3      stringi_1.1.7    
## [28] compiler_3.6.0    pillar_1.2.1      scales_0.5.0.9000
## [31] backports_1.1.2

References

Cleveland, William. 1993. Visualizing Data. Summit, NJ: Hobart Press.

Cleveland, William S., and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54. doi:10.2307/2288400.

———. 1987. “Graphical Perception: The Visual Decoding of Quantitative Information on Graphical Displays of Data.” Journal of the Royal Statistical Society. Series A (General) 150 (3): 192–229. doi:10.2307/2981473.

Gelman, Andrew, and Antony Unwin. 2013. “Infovis and Statistical Graphics: Different Goals, Different Looks.” Journal of Computational and Graphical Statistics 22 (1): 2–28. doi:10.1080/10618600.2012.761137.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2): 121–30. http://www.tandfonline.com/doi/abs/10.1198/000313002317572790.

John Rauser. 2016. “How Humans See Data.” https://www.youtube.com/watch?v=fSgEeI2Xpdc.

Tufte, Edward. 2001. The Visual Display of Quantitative Information. 2d ed. Graphics Press.

Tufte, Edward R. 1995. Envisioning Information. Cheshire, Conn.: Graphics Press.

———. 1997. Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, Conn.: Graphics Press.

———. 2006. Beautiful Evidence. Cheshire, Conn.: Graphics Press.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. 2nd Printing. Springer.

Wilkinson, L. 1999. The Grammar of Graphics. New York: Springer.