intro material

Course structure

Course goals

General introduction to data viz principles and tools

Course structures

lectures from professors on basic ideas (first half)
in-class work
homework
lectures from students on topics/advanced ideas (second half)

Tools

Version control

Git: distributed version control system
GitHub: centralized version control server
- alternatives: BitBucket, GitLab, …
Git clients: software for working with Git on your computer
- command-line (e.g. git add foo.rmd)
- RStudio
- others (GitHub desktop etc.)

Basic Git workflow with RStudio

create repository on Github
copy repository to local machine
- git clone
- RStudio: File > New Project > Version Control > Git > fill in name from "Clone" button on GH

repeat:
- pull (fetch and integrate changes from GH) [git pull]
  - RStudio: Git panel > click blue down-arrow
- do stuff (create, edit files, etc.)
- stage [git add]
  - RStudio: Git panel > click "Staged" button
- commit [git commit]
  - RStudio: Git panel > click "Commit" icon >
    enter commit message > click "Commit" button (ignore "amend previous commit" button!)
- push [git push]
  - RStudio: Git panel > click green up-arrow

tidyverse

set of R packages: https://www.tidyverse.org/
advantages
- expressiveness
- speed
- new hotness
disadvantages
- minor incompatibilities with base R
- rapid evolution
- non-standard evaluation

tidyverse: big ideas

new verbs
piping
tibbles

tidyverse: new verbs

filter(x,condition): choose rows equivalent to subset(x,condition) or x[condition,] (with non-standard evaluation)
select(x,condition): choose columns
- equivalent to subset(x,select=condition) or x[,condition]
- helper functions such as starts_with(), matches()
mutate(x,var=...): change or add variables (equivalent to x$var = ... or transform(x,var=...)

tidyverse: split-apply-combine

group_by(): adds grouping information
summarise(): collapses variables to a single value
e.g.

x <- group_by(x,course)
summarise(x,mean_score=mean(score),sd_score=sd(score))

equivalent to plyr::ddply() or

d_split <- split(d,d$var)       ## split
d_proc <- lapply(d_split, ...)  ## apply
d_res <- do.call(rbind,d_proc)  ## combine

tidyverse: piping

new %>% operator (orig. from magrittr package)
directs result of previous operation to next function, as first argument
e.g.

(d_input
    %>% select(row1,row2)
    %>% filter(cond1,cond2)
    %>% mutate(...)
) -> d_output

tidyverse: tibbles

extension of data frames (sort of)
differences
- printing
  - only prints first few rows/columns
  - labels columns by type
- no rownames
- never drops dimensions (tib[,"column1"] is still a tibble)

tidyverse: reshaping (`tidyr` package)

gather(data,key,value,<include/exclude>)
- wide to long
- reshape2::melt()
spread(data,key,value)
- long to wide
- reshape2::cast()

types of data visualization

exploratory

find patterns in data, explore hypotheses
emphasize robust approaches
minimize (parametric) assumptions
Tukey, Cleveland

diagnostic

evaluate assumptions of a model
- normality
- homoscedasticity
- lack of bias/goodness of fit
easily spot deviations
identify outliers and influential points

inferential

coefficient plots
replacement for tables
also: tests of inference Wickham et al. (2010)
Gelman

expository: data-viz

tell an accurate story
high information density
Tufte, Cleveland

presentation: info-viz

grab attention/engage/sell/entertain
"puzzle" graphics

dashboards

present a quick overview of a data set
user control

dynamic

engage
allow reader to drill down
Cook

References

Wickham, H et al. 2010. IEEE Transactions on Visualization and Computer Graphics 16 (6) (November): 973–979. doi:10.1109/TVCG.2010.161.

Course structure

Course goals

Course structures

Tools

Version control

Basic Git workflow with RStudio

tidyverse

tidyverse: big ideas

tidyverse: new verbs

tidyverse: split-apply-combine

tidyverse: piping

tidyverse: tibbles

tidyverse: reshaping (tidyr package)

types of data visualization

exploratory

diagnostic

inferential

expository: data-viz

presentation: info-viz

dashboards

dynamic

References

tidyverse: reshaping (`tidyr` package)