(Generalized) linear mixed models

a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,
and interactions
(some) non-Normal responses
(e.g. binomial, Poisson, and extensions)
(some) nonlinearity
(e.g. logistic, exponential, hyperbolic)
non-independent (grouped) data
(can also be used to implement correlations and smooth terms)

a small corner of the universe

Coral protection from seastars (Culcita) by symbionts¹

Environmental stress: Glycera cell survival (D. Julian unpubl.)

simulated herbivory of Arabidopsis²

Coral demography (J.-S. White unpubl.)

Intercept random effects

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} \\ & = (\beta_0 + \epsilon_{1,j}) + \beta_1 x_{ij} + \epsilon_{1,j} \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \epsilon_{1,j} & \sim \textrm{Normal}(0,\sigma_1^2) \end{split} \]

Could have multiple, nested levels of random effects
(genotype within population within region …), or crossed REs
formula: y ~ 1 + x + (1 | g)

Random-slopes model

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} + \epsilon_{2,j} x_{ij} \\ & = (\beta_0 + \epsilon_{1,j}) + (\beta_1 + \epsilon_{2,j}) x_{ij} + \epsilon_{0,ij} \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \{\epsilon_{1,j}, \epsilon_{2,j}\} & \sim \textrm{MVN}(0,\Sigma) \end{split} \]

variation in the effect of a treatment or covariate across groups
estimate the correlation between the intercept and slope
formula: y ~ 1 + x + (1 + x | g)

General definition

\[ \begin{split} \underbrace{Y_i}_{\text{response}} & \sim \overbrace{\text{Distr}}^{\substack{\text{conditional} \\ \text{distribution}}}(\underbrace{g^{-1}(\eta_i)}_{\substack{\text{inverse} \\ \text{link} \\ \text{function}}},\underbrace{\phi}_{\substack{\text{scale} \\ \text{parameter}}}) \\ \underbrace{\boldsymbol \eta}_{\substack{\text{linear} \\ \text{predictor}}} & = \underbrace{\boldsymbol X \boldsymbol \beta}_{\substack{\text{fixed} \\ \text{effects}}} + \underbrace{\boldsymbol Z \boldsymbol b}_{\substack{\text{random} \\ \text{effects}}} \\ \underbrace{\boldsymbol b}_{\substack{\text{conditional} \\ \text{modes}}} & \sim \text{MVN}(\boldsymbol 0, \underbrace{\Sigma(\boldsymbol \theta)}_{\substack{\text{covariance} \\ \text{matrix}}}) \end{split} \]

the structure of \(Z\) and \(\Sigma\) reflect one or more underlying categorical grouping variables (clusters, blocks, subjects, etc. etc.) or combinations thereof

What are random effects?

A method for …

accounting for correlations among observations within clusters
compromising between
complete pooling (no among-cluster variance)
and fixed effects (large among-cluster variance)
handling levels selected at random from a larger population
sharing information among levels (shrinkage estimation)
estimating variability among clusters
allowing predictions for unmeasured clusters

Random-effect myths

clusters must always be sampled at random
a complete sample cannot be treated as a random effect
random effects are always a nuisance variable
nothing can be said about the predictions of a random effect
you should always use a random effect no matter how few levels you have

Why use random effects? (inferential/philosophical)

When you:

do want to
- quantify variation among groups
- make predictions about unobserved groups
have (randomly) sampled clusters from a larger population
have clusters that are exchangeable
don’t want to
- test hypotheses about differences between particular clusters

Why use random effects? (practical)^3,4

want to combine information across groups
have variation in information per cluster (number of samples or noisiness);
have a categorical predictor that is a nuisance variable (i.e., it is not of direct interest, but should be controlled for).
have more than 5-6 groups, or regularizing/using priors (otherwise, use fixed)

Avoiding MM

for nested designs: compute cluster means⁵
use fixed effects (or two-stage models) when there are
- many samples per cluster
- few clusters

Estimation

Maximum likelihood estimation

Best fit is a compromise between two components
(consistency of data with fixed effects and conditional modes; consistency of random effect with RE distribution)
\(\underbrace{{\cal L}(\boldsymbol \beta,\boldsymbol \theta)}_{\substack{\text{marginal} \\ \text{likelihood}}} = \int \underbrace{{\cal L}(\boldsymbol y|\beta,b)}_{\substack{\text{conditional} \\ \text{likelihood}}} \cdot {\cal L}(\boldsymbol b|\Sigma(\theta)) \, d\boldsymbol b\)

…

Shrinkage: Arabidopsis example

Shrinkage in a random-slopes model

From Christophe Lalanne, see here:

A tiny bit about GLMs

GLMs

relax assumption of Gaussian conditional distribution
classic GLMs from the exponential (dispersion) family (binomial, Poisson, Gamma, …)
link function and variance function
some packages (based on MLE) relax the exponential-family requirement (Beta, Student-\(t\), Tweedie, skew-normal, …)
still allow/use a link function

beyond GLMs

many other variations are possible
zero-inflated (altered) and hurdle models
censored data
scale-location models (dispersion parameter gets its own model)
predictor variable → linear predictor still governed by a linear model (for convenience)

Methods

Estimation

we need to compute an integral
in linear mixed models the integral goes away (replaced by fancy linear algebra)
deterministic
- various approximate integrals⁶:
  penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, …⁷;
- more care needed for large variance, small clusters (e.g. binary data)
- flexibility and speed vs. accuracy
stochastic (Monte Carlo): frequentist and Bayesian^8–10. MCMC, importance sampling
- (much) slower but flexible and accurate

Estimation: Culcita¹

What is the maximal model?

Which effects vary within which groups?
If effects don’t vary within groups, then we can’t estimate among-group variation in the effect
- convenient
- maybe less powerful
e.g. female rats exposed to different doses of radiation, multiple pups per mother, multiple measurements per pup (labeled by time). Maximal model … ?
Maximal model is often impractical/unfeasible
- Culcita (coral-reef) example: randomized-block design, so each treatment (none/crabs/shrimp/both) is repeated in every block; thus (treat|block) is maximal
- CBPP data: each herd is measured in every period, so in principle we could use (period|herd), not just (1|herd)

Inference

Wald tests/CIs

typical results of summary()
symmetric CIs
assumes log-likelihood surface is quadratic
exact for linear models: approximate for (G)LMMs
fast
approximation sometimes awful (complete separation)
“denominator degrees of freedom” issues
- e.g. Kenward-Roger correction¹¹

Likelihood ratio tests/profile CIs

better than Wald, but still asymptotic
for hypothesis tests, fit & compare full and nested models (anova(), drop1())
for CIs, compute profile confidence intervals (slow)

Nonparametric bootstrap

Bootstrapping: slow, but gold standard for frequentist models
Need to respect structure when resampling
- Residual bootstrapping for LMMs
- Nested resampling where possible
lmeresampler package

Parametric bootstrap

works for any model (including crossed-RE GLMMs)
fit null model to data
simulate “data” from null model
fit null and working model, compute likelihood difference
repeat to estimate null distribution
assumes model correctly specified
bootMer(), pbkrtest package

Bayesian inference

If we have a good sample from the posterior distribution, we get everything we want for free
Bayesian methods are more challenging to make work, and slow
Gibbs sampling \(\to\) Hamiltonian Monte Carlo
MCMCglmm (Gibbs), brms (HMC), rstanarm (HMC)

Challenges & open questions

Challenges

GLMMs with few clusters: need to go beyond Laplace approximation (importance sampling, Gausss-Hermite)
LMMs with few clusters: need finite-size corrections (KR/PB/MCMC)
Choosing the complexity of RE components (singular fits):¹³. buildmer package ?
Big data: speed!
Model diagnosis
Complex correlation structures: spatial, temporal, phylogenetic, …^14–16

Landscape

frequentist: nlme, lme4, glmmTMB, MixedModels.jl (Julia)
Bayesian: MCMCglmm, rstanarm, brms, bamlss, INLA
build-your-own: greta, RTMB, NIMBLE, Stan (rethinking package)
spatial: glmmTMB, sdmTMB, spaMM

resources

(code ASPROMP8)

references

McKeon, C. S., Stier, A., McIlroy, S. & Bolker, B. Multiple defender effects: Synergistic coral defense by mutualist crustaceans. Oecologia 169, 1095–1103 (2012).

Banta, J. A., Stevens, M. H. H. & Pigliucci, M. A comprehensive test of the ’limiting resources’ framework applied to plant tolerance to apical meristem damage. Oikos 119, 359–369 (2010).

Crawley, M. J. Statistical Computing: An Introduction to Data Analysis Using S-PLUS. (John Wiley & Sons, 2002).

Gelman, A. Analysis of variance: Why it is more important than ever. Annals of Statistics 33, 1–53 (2005).

Murtaugh, P. A. Simplicity and complexity in ecological data analysis. Ecology 88, 56–62 (2007).

Breslow, N. E. Whither PQL? in Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data (eds. Lin, D. Y. & Heagerty, P. J.) 1–22 (Springer, 2004).

Biswas, K. Performances of different estimation methods for generalized linear mixed models. (McMaster University, 2015).

Booth, J. G. & Hobert, J. P. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society. Series B 61, 265–285 (1999).

Sung, Y. J. & Geyer, C. J. Monte Carlo likelihood inference for missing data models. The Annals of Statistics 35, 990–1011 (2007).

10.

Ponciano, J. M., Taper, M. L., Dennis, B. & Lele, S. R. Hierarchical models in ecology: Confidence intervals, hypothesis testing, and model selection using data cloning. Ecology 90, 356–362 (2009).

11.

Stroup, W. W. Rethinking the analysis of non-normal data in plant and soil science. Agronomy Journal 106, 1–17 (2014).

12.

Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68, 255–278 (2013).

13.

Bates, D., Kliegl, R., Vasishth, S. & Baayen, H. Parsimonious Mixed Models. arXiv:1506.04967 [stat] (2015).

14.

Ives, A. R. & Zhu, J. Statistics for correlated data: Phylogenies, space, and time. Ecological Applications 16, 20–32 (2006).

15.

Rue, H., Martino, S. & Chopin, N. Gaussian models using integrated nested Laplace approximations (with discussion). Journal of the Royal Statistical Society, Series B 71, 319–392 (2009).

16.

Rousset, F. & Ferdy, J.-B. Testing environmental and genetic effects in the presence of spatial autocorrelation. Ecography no–no (2014) doi:10.1111/ecog.00566.

17.

Bolker, B. M. Linear and generalized linear mixed models. in Ecological statistics: Contemporary theory and application (eds. Fox, G. A., Negrete-Yankelevich, S. & Sosa, V. J.) (Oxford University Press, 2015).

generalized linear mixed models: overview

Ben Bolker

(Generalized) linear mixed models

a small corner of the universe

Coral protection from seastars (Culcita) by symbionts¹

Environmental stress: Glycera cell survival (D. Julian unpubl.)

simulated herbivory of Arabidopsis²

Coral demography (J.-S. White unpubl.)

Intercept random effects

Random-slopes model

General definition

What are random effects?

Random-effect myths

Why use random effects? (inferential/philosophical)

Why use random effects? (practical)^3,4

Avoiding MM

Estimation

Maximum likelihood estimation

…

Shrinkage: Arabidopsis example

Shrinkage in a random-slopes model

A tiny bit about GLMs

GLMs

beyond GLMs

Methods

Estimation

Estimation: Culcita¹

What is the maximal model?

Inference

Wald tests/CIs

Likelihood ratio tests/profile CIs

Nonparametric bootstrap

Parametric bootstrap

Bayesian inference

Challenges & open questions

Challenges

Landscape

resources

references

generalized linear mixed models: overview

Ben Bolker

(Generalized) linear mixed models

a small corner of the universe

Coral protection from seastars (Culcita) by symbionts1

Environmental stress: Glycera cell survival (D. Julian unpubl.)

simulated herbivory of Arabidopsis2

Coral demography (J.-S. White unpubl.)

Intercept random effects

Random-slopes model

General definition

What are random effects?

Random-effect myths

Why use random effects? (inferential/philosophical)

Why use random effects? (practical)3,4

Avoiding MM

Estimation

Maximum likelihood estimation

…

Shrinkage: Arabidopsis example

Shrinkage in a random-slopes model

A tiny bit about GLMs

GLMs

beyond GLMs

Methods

Estimation

Estimation: Culcita1

What is the maximal model?

Inference

Wald tests/CIs

Likelihood ratio tests/profile CIs

Nonparametric bootstrap

Parametric bootstrap

Bayesian inference

Challenges & open questions

Challenges

Landscape

resources

references

Coral protection from seastars (Culcita) by symbionts¹

simulated herbivory of Arabidopsis²

Why use random effects? (practical)^3,4

Estimation: Culcita¹