(Generalized) linear mixed models

a statistical modeling framework incorporating:

a small corner of the universe

Coral protection from seastars (Culcita) by symbionts1

Environmental stress: Glycera cell survival (D. Julian unpubl.)

simulated herbivory of Arabidopsis2

Coral demography (J.-S. White unpubl.)

Intercept random effects

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} \\ & = (\beta_0 + \epsilon_{1,j}) + \beta_1 x_{ij} + \epsilon_{1,j} \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \epsilon_{1,j} & \sim \textrm{Normal}(0,\sigma_1^2) \end{split} \]

Random-slopes model

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} + \epsilon_{2,j} x_{ij} \\ & = (\beta_0 + \epsilon_{1,j}) + (\beta_1 + \epsilon_{2,j}) x_{ij} + \epsilon_{0,ij} \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \{\epsilon_{1,j}, \epsilon_{2,j}\} & \sim \textrm{MVN}(0,\Sigma) \end{split} \]

General definition

\[ \begin{split} \underbrace{Y_i}_{\text{response}} & \sim \overbrace{\text{Distr}}^{\substack{\text{conditional} \\ \text{distribution}}}(\underbrace{g^{-1}(\eta_i)}_{\substack{\text{inverse} \\ \text{link} \\ \text{function}}},\underbrace{\phi}_{\substack{\text{scale} \\ \text{parameter}}}) \\ \underbrace{\boldsymbol \eta}_{\substack{\text{linear} \\ \text{predictor}}} & = \underbrace{\boldsymbol X \boldsymbol \beta}_{\substack{\text{fixed} \\ \text{effects}}} + \underbrace{\boldsymbol Z \boldsymbol b}_{\substack{\text{random} \\ \text{effects}}} \\ \underbrace{\boldsymbol b}_{\substack{\text{conditional} \\ \text{modes}}} & \sim \text{MVN}(\boldsymbol 0, \underbrace{\Sigma(\boldsymbol \theta)}_{\substack{\text{covariance} \\ \text{matrix}}}) \end{split} \]

What are random effects?

A method for …

Random-effect myths

Why use random effects? (inferential/philosophical)

When you:

Why use random effects? (practical)3,4

Avoiding MM

Estimation

Maximum likelihood estimation

  • Best fit is a compromise between two components
    (consistency of data with fixed effects and conditional modes; consistency of random effect with RE distribution)
  • \(\underbrace{{\cal L}(\boldsymbol \beta,\boldsymbol \theta)}_{\substack{\text{marginal} \\ \text{likelihood}}} = \int \underbrace{{\cal L}(\boldsymbol y|\beta,b)}_{\substack{\text{conditional} \\ \text{likelihood}}} \cdot {\cal L}(\boldsymbol b|\Sigma(\theta)) \, d\boldsymbol b\)

Shrinkage: Arabidopsis example

Shrinkage in a random-slopes model

From Christophe Lalanne, see here:

A tiny bit about GLMs

GLMs

  • relax assumption of Gaussian conditional distribution
  • classic GLMs from the exponential (dispersion) family (binomial, Poisson, Gamma, …)
  • link function and variance function
  • some packages (based on MLE) relax the exponential-family requirement (Beta, Student-\(t\), Tweedie, skew-normal, …)
  • still allow/use a link function

beyond GLMs

  • many other variations are possible
  • zero-inflated (altered) and hurdle models
  • censored data
  • scale-location models (dispersion parameter gets its own model)
  • predictor variable → linear predictor still governed by a linear model (for convenience)

Methods

Estimation

  • we need to compute an integral
  • in linear mixed models the integral goes away (replaced by fancy linear algebra)
  • deterministic
    • various approximate integrals6:
      penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, …7;
    • more care needed for large variance, small clusters (e.g. binary data)
    • flexibility and speed vs. accuracy
  • stochastic (Monte Carlo): frequentist and Bayesian810. MCMC, importance sampling
    • (much) slower but flexible and accurate

Estimation: Culcita1

What is the maximal model?

  • Which effects vary within which groups?
  • If effects don’t vary within groups, then we can’t estimate among-group variation in the effect
    • convenient
    • maybe less powerful
  • e.g. female rats exposed to different doses of radiation, multiple pups per mother, multiple measurements per pup (labeled by time). Maximal model … ?
  • Maximal model is often impractical/unfeasible
    • Culcita (coral-reef) example: randomized-block design, so each treatment (none/crabs/shrimp/both) is repeated in every block; thus (treat|block) is maximal
    • CBPP data: each herd is measured in every period, so in principle we could use (period|herd), not just (1|herd)

Inference

Wald tests/CIs

  • typical results of summary()
  • symmetric CIs
  • assumes log-likelihood surface is quadratic
  • exact for linear models: approximate for (G)LMMs
  • fast
  • approximation sometimes awful (complete separation)
  • “denominator degrees of freedom” issues
    • e.g. Kenward-Roger correction11

Likelihood ratio tests/profile CIs

  • better than Wald, but still asymptotic
  • for hypothesis tests, fit & compare full and nested models (anova(), drop1())
  • for CIs, compute profile confidence intervals (slow)

Nonparametric bootstrap

  • Bootstrapping: slow, but gold standard for frequentist models
  • Need to respect structure when resampling
    • Residual bootstrapping for LMMs
    • Nested resampling where possible
  • lmeresampler package

Parametric bootstrap

  • works for any model (including crossed-RE GLMMs)
  • fit null model to data
  • simulate “data” from null model
  • fit null and working model, compute likelihood difference
  • repeat to estimate null distribution
  • assumes model correctly specified
  • bootMer(), pbkrtest package

Bayesian inference

  • If we have a good sample from the posterior distribution, we get everything we want for free
  • Bayesian methods are more challenging to make work, and slow
  • Gibbs sampling \(\to\) Hamiltonian Monte Carlo
  • MCMCglmm (Gibbs), brms (HMC), rstanarm (HMC)

Challenges & open questions

Challenges

  • GLMMs with few clusters: need to go beyond Laplace approximation (importance sampling, Gausss-Hermite)
  • LMMs with few clusters: need finite-size corrections (KR/PB/MCMC)
  • Choosing the complexity of RE components (singular fits):13. buildmer package ?
  • Big data: speed!
  • Model diagnosis
  • Complex correlation structures: spatial, temporal, phylogenetic, …1416

Landscape

  • frequentist: nlme, lme4, glmmTMB, MixedModels.jl (Julia)
  • Bayesian: MCMCglmm, rstanarm, brms, bamlss, INLA
  • build-your-own: greta, RTMB, NIMBLE, Stan (rethinking package)
  • spatial: glmmTMB, sdmTMB, spaMM

references

1.
McKeon, C. S., Stier, A., McIlroy, S. & Bolker, B. Multiple defender effects: Synergistic coral defense by mutualist crustaceans. Oecologia 169, 1095–1103 (2012).
2.
Banta, J. A., Stevens, M. H. H. & Pigliucci, M. A comprehensive test of the ’limiting resources’ framework applied to plant tolerance to apical meristem damage. Oikos 119, 359–369 (2010).
3.
Crawley, M. J. Statistical Computing: An Introduction to Data Analysis Using S-PLUS. (John Wiley & Sons, 2002).
4.
Gelman, A. Analysis of variance: Why it is more important than ever. Annals of Statistics 33, 1–53 (2005).
5.
Murtaugh, P. A. Simplicity and complexity in ecological data analysis. Ecology 88, 56–62 (2007).
6.
Breslow, N. E. Whither PQL? in Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data (eds. Lin, D. Y. & Heagerty, P. J.) 1–22 (Springer, 2004).
7.
8.
Booth, J. G. & Hobert, J. P. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society. Series B 61, 265–285 (1999).
9.
Sung, Y. J. & Geyer, C. J. Monte Carlo likelihood inference for missing data models. The Annals of Statistics 35, 990–1011 (2007).
10.
Ponciano, J. M., Taper, M. L., Dennis, B. & Lele, S. R. Hierarchical models in ecology: Confidence intervals, hypothesis testing, and model selection using data cloning. Ecology 90, 356–362 (2009).
11.
Stroup, W. W. Rethinking the analysis of non-normal data in plant and soil science. Agronomy Journal 106, 1–17 (2014).
12.
Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68, 255–278 (2013).
13.
Bates, D., Kliegl, R., Vasishth, S. & Baayen, H. Parsimonious Mixed Models. arXiv:1506.04967 [stat] (2015).
14.
Ives, A. R. & Zhu, J. Statistics for correlated data: Phylogenies, space, and time. Ecological Applications 16, 20–32 (2006).
15.
Rue, H., Martino, S. & Chopin, N. Gaussian models using integrated nested Laplace approximations (with discussion). Journal of the Royal Statistical Society, Series B 71, 319–392 (2009).
16.
Rousset, F. & Ferdy, J.-B. Testing environmental and genetic effects in the presence of spatial autocorrelation. Ecography no–no (2014) doi:10.1111/ecog.00566.
17.
Bolker, B. M. Linear and generalized linear mixed models. in Ecological statistics: Contemporary theory and application (eds. Fox, G. A., Negrete-Yankelevich, S. & Sosa, V. J.) (Oxford University Press, 2015).