(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:



Coral protection from seastars (Culcita) by symbionts (McKeon et al., 2012)

Environmental stress: Glycera cell survival (D. Julian unpubl.)

Arabidopsis response to fertilization & herbivory (Banta et al., 2010)

Coral demography (J.-S. White unpubl.)

Technical definition

\[ \begin{split} \underbrace{Y_i}_{\text{response}} & \sim \overbrace{\text{Distr}}^{\substack{\text{conditional} \\ \text{distribution}}}(\underbrace{g^{-1}(\eta_i)}_{\substack{\text{inverse} \\ \text{link} \\ \text{function}}},\underbrace{\phi}_{\substack{\text{scale} \\ \text{parameter}}}) \\ \underbrace{\boldsymbol \eta}_{\substack{\text{linear} \\ \text{predictor}}} & = \underbrace{\boldsymbol X \boldsymbol \beta}_{\substack{\text{fixed} \\ \text{effects}}} + \underbrace{\boldsymbol Z \boldsymbol b}_{\substack{\text{random} \\ \text{effects}}} \\ \underbrace{\boldsymbol b}_{\substack{\text{conditional} \\ \text{modes}}} & \sim \text{MVN}(\boldsymbol 0, \underbrace{\Sigma(\boldsymbol \theta)}_{\substack{\text{variance-} \\ \text{covariance} \\ \text{matrix}}}) \end{split} \]

What are random effects?

A method for …

Random-effect myths

Reasons for random effects (inferential/philosophical)

Reasons for random effects (practical)

See also Crawley (2002); Gelman (2005)

Avoiding MM



Maximum likelihood estimation

  • Best fit is a compromise between two components
    (consistency of data with fixed effects and conditional modes; consistency of random effect with RE distribution)
    • Goodness-of-fit integrates over conditional modes

Shrinkage: Arabidopsis conditional modes

Shrinkage in a random-slopes model

From Christophe Lalanne, see here and here:


Estimation methods

  • deterministic
    • various approximate integrals (Breslow, 2004)
    • penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, … (Biswas, 2015);
      best methods needed for large variance, small clusters
    • flexibility and speed vs. accuracy
  • stochastic
  • stochastic (Monte Carlo): frequentist and Bayesian
    • (Booth & Hobert, 1999; Ponciano et al., 2009; Sung & Geyer, 2007)
    • usually slower but flexible and accurate

Estimation: Culcita (McKeon et al., 2012)


Wald tests

  • typical results of summary
  • exact for ANOVA, regression:
    approximation for GLM(M)s
  • fast
  • approximation is sometimes awful (Hauck-Donner effect)

Likelihood ratio tests

  • better than Wald, but still have two problems:
    • “denominator degrees of freedom” (when estimating scale)
    • for GLMMs, distributions are approximate anyway (Bartlett corrections)
    • Kenward-Roger correction? (Stroup, 2014)
  • Profile confidence intervals: expensive/fragile

p-values choices?

  • guess from classic design (R code)
  • conservative: take minimum number of groups - 1
  • Satterthwaite/Kenward-Roger (lmerTest, LMMs only)
  • parametric bootstrap (pbkrtest)

Parametric bootstrapping

  • fit null model to data
  • simulate “data” from null model
  • fit null and working model, compute likelihood difference
  • repeat to estimate null distribution
  • should be OK but ??? not well tested
    (assumes estimated parameters are “sufficiently” good)

Bayesian inference

  • If we have a good sample from the posterior distribution (Markov chains have converged etc. etc.) we get most of the inferences we want for free by summarizing the marginal posteriors
  • post hoc Bayesian methods: use deterministic/frequentist methods to find the maximum, then sample around it

Challenges & open questions

On beyond lme4

  • glmmTMB: zero-inflated and other distributions
  • brms,rstanarm: interfaces to Stan
  • INLA: spatial and temporal correlations
  • rethinking package


  • JAGS (R: rjags, r2jags)
  • TensorFlow (R: greta)
  • Stan (R: rstan)
  • TMB (R: TMB)

On beyond R

  • Julia: MixedModels package
  • Stata (GLLAMM, xtmelogit)
  • HLM, MLWiN



  • Small clusters: need AGQ/MCMC
  • Small numbers of clusters: need finite-size corrections (KR/PB/MCMC)
  • Small data sets: issues with singular fits
    (Barr et al., 2013) vs. (Bates et al., 2015)
  • Big data: speed!
  • Model diagnosis
  • Confidence intervals accounting for uncertainty in variances

Spatial and temporal correlations

  • Sometimes blocking takes care of non-independence …
  • but sometimes there is temporal or spatial correlation within blocks
  • … also phylogenetic … (Ives & Zhu, 2006)
  • “G-side” vs. “R-side” effects
  • tricky to implement for GLMMs, but new possibilities on the horizon (Rousset & Ferdy, 2014; Rue et al., 2009); https://github.com/stevencarlislewalker/lme4ord

Next steps

  • Complex random effects:
    regularization, model selection, penalized methods (lasso/fence)
  • Flexible correlation and variance structures
  • Flexible/nonparametric random effects distributions
  • hybrid & improved MCMC methods
  • Reliable assessment of out-of-sample performance



