Philosophy

What do we mean by statistical inference?

answering scientific questions

  • clear, well-posed questions (theory) >
    • good experimental design >
      • statistical practice

… all are necessary, all connected!

  • statistics is for:
    • quantifying best guesses (point estimates)
    • quantifying uncertainty (confidence intervals)
    • statements about clarity (statistical significance testing)

reproducibility crisis

many scientific results are unreproducible

  • lack of openness (data/methods)
  • questionable research practices (QRPs)
  • p-hacking; fishing; HARKing; snooping; “Texas sharpshooter fallacy”; “researcher degrees of freedom” (Simmons, Nelson, and Simonsohn 2011)
  • “garden of forking paths” (Gelman)

analytic decisions must be made independently of the data

pre-registration (formal or informal);
at least recognize the line between confirmatory and exploratory analyses

scientific hell

power analysis

  • experimental design: before you observe/experiment
  • think about biological effect sizes: what is the smallest effect that would be biologically interesting?
  • need to specify effects and variances (standard deviations)
  • simple designs (t-test, ANOVA, etc.)
  • most power analyses are crude/order-of-magnitude
  • simulation-based power analysis (Bolker (2008) ch. 5)

goals of analysis (Harrell 2001)

Harrell ch. 4 on SpringerLink (McMaster network)

  • exploration
  • prediction
  • inference

exploration

  • looking for patterns only
  • no p-values at all
  • confidence intervals (perhaps),
    but taken with an inferential grain of salt

prediction

  • want quantitative answers about specific cases
  • consider algorithmic approaches (esp. for big data)
  • penalized approaches:
    automatically reduce model complexity
  • confidence intervals are hard

inference

most typical scientific goal

qualitative statements about clarity and importance of effects:

  • effects that are distinguishable from null hypothesis of noise
  • test among discrete hypotheses

quantitative statements:

  • relative strength/magnitude of effects
  • importance (e.g. fraction variance explained)

what do p-values really mean?

  • something about “strength of evidence”
  • not “evidence for no effect” or “no difference”
  • null hypotheses in ecology are never (?) true
  • “the difference between significant and non-significant is not significant” (Gelman and Stern 2006)
  • try talking about statistical clarity instead

p-value example

##   p value   
## A  0.0011 **
## B  0.1913   
## C  0.0011 **
## D  0.1913   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Real example (Dushoff et al. 2006)

From a study of influenza mortality, estimating fraction of mortality attributable to influenza A, influenza B, or weather alone …

Why does weather not seem to have an effect???

the explanation

realism in data analysis

how much data do you need for a given model?

  • rule of thumb: 10-20 per data point
  • rules for continuous, count, binomial data
  • counting data points/“degrees of freedom” for clustered data?

dimension reduction

  • must be a priori
  • discard interactions
  • simplify questions
  • collapse variables, e.g. by PCA

a data analysis road map

  1. figure out the (subject-area) question
  2. design experiment/data collection (power analysis; simulation)

  1. collect data

  1. understand the data
  2. specify the model; write it down!

  1. inspect data (Q/A) (return to 5? )
  2. fit model
  3. graphical diagnostics (return to 5? )
  4. interpret parameters; inference; plot results

References

Bolker, Benjamin M. 2008. Ecological Models and Data in R. Princeton, NJ: Princeton University Press.

Dushoff, Jonathan, Joshua B. Plotkin, Cecile Viboud, David J. D. Earn, and Lone Simonsen. 2006. “Mortality Due to Influenza in the United States—An Annualized Regression Approach Using Multiple-Cause Mortality Data.” American Journal of Epidemiology 163 (2): 181–87. doi:10.1093/aje/kwj024.

Gelman, Andrew, and Hal Stern. 2006. “The Difference Between ‘Significant’ and ‘Not Significant’ Is Not Itself Statistically Significant.” The American Statistician 60 (4): 328–31. doi:10.1198/000313006X152649.

Harrell, Frank. 2001. Regression Modeling Strategies. Springer.

Simmons, Joseph P., Leif D. Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11): 1359–66. doi:10.1177/0956797611417632.