(Generalized) linear mixed models
a statistical modeling framework incorporating:
- combinations of categorical and continuous predictors,
and interactions
- (some) non-Normal responses
(e.g. binomial, Poisson, and extensions)
- (some) nonlinearity
(e.g. logistic, exponential, hyperbolic)
- non-independent (grouped) data
(can also be used to implement correlations and smooth terms)
a small corner of the universe
Coral protection from seastars (Culcita) by symbionts1
Environmental stress: Glycera cell survival (D. Julian
unpubl.)
simulated herbivory of Arabidopsis2
Coral demography (J.-S. White unpubl.)
Intercept random effects
\[
\begin{split}
y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} +
\epsilon_{1,j} \\
& = (\beta_0 + \epsilon_{1,j}) + \beta_1 x_{ij} + \epsilon_{1,j} \\
\epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\
\epsilon_{1,j} & \sim \textrm{Normal}(0,\sigma_1^2)
\end{split}
\]
- Could have multiple, nested levels of random effects
(genotype within population within region …), or crossed
REs
- formula:
y ~ 1 + x + (1 | g)
Random-slopes model
\[
\begin{split}
y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} +
\epsilon_{1,j} +
\epsilon_{2,j} x_{ij} \\
& = (\beta_0 + \epsilon_{1,j}) + (\beta_1 + \epsilon_{2,j}) x_{ij} +
\epsilon_{0,ij} \\
\epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\
\{\epsilon_{1,j}, \epsilon_{2,j}\} & \sim \textrm{MVN}(0,\Sigma)
\end{split}
\]
- variation in the effect of a treatment or covariate across
groups
- estimate the correlation between the intercept and slope
- formula:
y ~ 1 + x + (1 + x | g)
General definition
\[
\begin{split}
\underbrace{Y_i}_{\text{response}} & \sim
\overbrace{\text{Distr}}^{\substack{\text{conditional} \\
\text{distribution}}}(\underbrace{g^{-1}(\eta_i)}_{\substack{\text{inverse}
\\ \text{link} \\
\text{function}}},\underbrace{\phi}_{\substack{\text{scale} \\
\text{parameter}}}) \\
\underbrace{\boldsymbol \eta}_{\substack{\text{linear} \\
\text{predictor}}} &
=
\underbrace{\boldsymbol X \boldsymbol \beta}_{\substack{\text{fixed} \\
\text{effects}}} +
\underbrace{\boldsymbol Z \boldsymbol b}_{\substack{\text{random} \\
\text{effects}}}
\\
\underbrace{\boldsymbol b}_{\substack{\text{conditional} \\
\text{modes}}} &
\sim \text{MVN}(\boldsymbol 0, \underbrace{\Sigma(\boldsymbol
\theta)}_{\substack{\text{covariance} \\ \text{matrix}}})
\end{split}
\]
- the structure of \(Z\) and \(\Sigma\) reflect one or more underlying
categorical grouping variables (clusters,
blocks, subjects, etc. etc.) or combinations thereof
What are random effects?
A method for …
- accounting for correlations among observations within clusters
- compromising between
complete pooling (no among-cluster variance)
and fixed effects (large among-cluster variance)
- handling levels selected at random from a larger population
- sharing information among levels (shrinkage
estimation)
- estimating variability among clusters
- allowing predictions for unmeasured clusters
Random-effect myths
- clusters must always be sampled at random
- a complete sample cannot be treated as a random effect
- random effects are always a nuisance variable
- nothing can be said about the predictions of a random effect
- you should always use a random effect no matter how few levels you
have
Why use random effects? (inferential/philosophical)
When you:
- do want to
- quantify variation among groups
- make predictions about unobserved groups
- have (randomly) sampled clusters from a larger population
- have clusters that are exchangeable
- don’t want to
- test hypotheses about differences between particular clusters
Why use random effects? (practical)3,4
- want to combine information across groups
- have variation in information per cluster (number of samples or
noisiness);
- have a categorical predictor that is a nuisance variable (i.e., it
is not of direct interest, but should be controlled for).
- have more than 5-6 groups, or regularizing/using priors (otherwise,
use fixed)
Avoiding MM
- for nested designs: compute cluster means5
- use fixed effects (or two-stage models) when there are
- many samples per cluster
- few clusters
Estimation
Maximum likelihood estimation
- Best fit is a compromise between two components
(consistency of data with fixed effects and conditional modes;
consistency of random effect with RE distribution)
- \(\underbrace{{\cal L}(\boldsymbol
\beta,\boldsymbol \theta)}_{\substack{\text{marginal} \\
\text{likelihood}}} = \int \underbrace{{\cal L}(\boldsymbol
y|\beta,b)}_{\substack{\text{conditional} \\ \text{likelihood}}} \cdot
{\cal L}(\boldsymbol b|\Sigma(\theta)) \, d\boldsymbol b\)
…
Shrinkage: Arabidopsis example
Shrinkage in a random-slopes model
From Christophe Lalanne, see here:
A tiny bit about GLMs
GLMs
- relax assumption of Gaussian conditional distribution
- classic GLMs from the exponential (dispersion) family
(binomial, Poisson, Gamma, …)
- link function and variance function
- some packages (based on MLE) relax the exponential-family
requirement (Beta, Student-\(t\),
Tweedie, skew-normal, …)
- still allow/use a link function
beyond GLMs
- many other variations are possible
- zero-inflated (altered) and hurdle models
- censored data
- scale-location models (dispersion parameter gets its own model)
- predictor variable → linear predictor still governed by a
linear model (for convenience)
Methods
Estimation
- we need to compute an integral
- in linear mixed models the integral goes away (replaced by
fancy linear algebra)
- deterministic
- various approximate integrals6:
penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, …7;
- more care needed for large variance, small clusters (e.g. binary
data)
- flexibility and speed vs. accuracy
- stochastic (Monte Carlo): frequentist and Bayesian8–10. MCMC,
importance sampling
- (much) slower but flexible and accurate
Estimation: Culcita1
What is the maximal model?
- Which effects vary within which groups?
- If effects don’t vary within groups, then we can’t estimate
among-group variation in the effect
- convenient
- maybe less powerful
- e.g. female rats exposed to different doses of radiation, multiple
pups per mother, multiple measurements per pup (labeled by time).
Maximal model … ?
- Maximal model is often impractical/unfeasible
- Culcita (coral-reef) example: randomized-block design, so
each treatment (none/crabs/shrimp/both) is repeated in every block; thus
(treat|block)
is maximal
- CBPP data: each herd is measured in every period, so in principle we
could use
(period|herd)
, not just
(1|herd)
Inference
Wald tests/CIs
- typical results of
summary()
- symmetric CIs
- assumes log-likelihood surface is quadratic
- exact for linear models: approximate for (G)LMMs
- fast
- approximation sometimes awful (complete separation)
- “denominator degrees of freedom” issues
- e.g. Kenward-Roger correction11
Likelihood ratio tests/profile CIs
- better than Wald, but still asymptotic
- for hypothesis tests, fit & compare full and nested models
(
anova()
, drop1()
)
- for CIs, compute profile confidence intervals (slow)
Nonparametric bootstrap
- Bootstrapping: slow, but gold standard for frequentist models
- Need to respect structure when resampling
- Residual bootstrapping for LMMs
- Nested resampling where possible
lmeresampler
package
Parametric bootstrap
- works for any model (including crossed-RE GLMMs)
- fit null model to data
- simulate “data” from null model
- fit null and working model, compute likelihood difference
- repeat to estimate null distribution
- assumes model correctly specified
bootMer()
, pbkrtest
package
Bayesian inference
- If we have a good sample from the posterior distribution, we get
everything we want for free
- Bayesian methods are more challenging to make work, and slow
- Gibbs sampling \(\to\)
Hamiltonian Monte Carlo
MCMCglmm
(Gibbs), brms
(HMC),
rstanarm
(HMC)
Challenges & open questions
Challenges
- GLMMs with few clusters: need to go beyond Laplace approximation
(importance sampling, Gausss-Hermite)
- LMMs with few clusters: need finite-size corrections
(KR/PB/MCMC)
- Choosing the complexity of RE components (singular
fits):13. buildmer package ?
- Big data: speed!
- Model diagnosis
- Complex correlation structures: spatial, temporal, phylogenetic,
…14–16
Landscape
- frequentist:
nlme
, lme4
,
glmmTMB
, MixedModels.jl
(Julia)
- Bayesian:
MCMCglmm
, rstanarm
,
brms
, bamlss
, INLA
- build-your-own:
greta
, RTMB
,
NIMBLE
, Stan
(rethinking
package)
- spatial:
glmmTMB
, sdmTMB
,
spaMM
resources
(code ASPROMP8)
references
3.
Crawley, M. J. Statistical Computing: An
Introduction to Data Analysis Using S-PLUS. (John
Wiley & Sons, 2002).
6.
Breslow, N. E. Whither PQL? in
Proceedings of the second Seattle symposium in
biostatistics: Analysis of correlated data (eds. Lin, D. Y. &
Heagerty, P. J.) 1–22 (Springer, 2004).
15.
Rue, H., Martino, S. & Chopin, N. Gaussian
models using integrated nested Laplace approximations (with
discussion). Journal of the Royal Statistical Society, Series B
71, 319–392 (2009).
16.
Rousset, F. & Ferdy, J.-B. Testing
environmental and genetic effects in the presence of spatial
autocorrelation.
Ecography no–no (2014) doi:
10.1111/ecog.00566.
17.
Bolker, B. M. Linear and generalized linear
mixed models. in Ecological statistics: Contemporary theory and
application (eds. Fox, G. A., Negrete-Yankelevich, S. & Sosa,
V. J.) (Oxford University Press, 2015).