(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

  • combinations of categorical and continuous predictors,
    and interactions
  • (some) non-Normal responses
    (e.g. binomial, Poisson, and extensions)
  • (some) nonlinearity
    (e.g. logistic, exponential, hyperbolic)
  • non-independent (grouped) data

image

Coral protection from seastars (Culcita) by symbionts (McKeon et al., 2012)

Environmental stress: Glycera cell survival (D. Julian unpubl.)

Arabidopsis response to fertilization & herbivory (Banta et al., 2010)

Coral demography (J.-S. White unpubl.)

Simple mixed models
(scalar, intercept-only)

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} \\ & = (\beta_0 + \epsilon_{1,j}) + \beta_1 x_{ij} + \epsilon_{1,j} \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \epsilon_{1,j} & \sim \textrm{Normal}(0,\sigma_1^2) \end{split} \]

  • Could have multiple, nested levels of random effects
    (genotype within population within region …), or crossed REs

Random-slopes model

\[ \begin{split} y_{ij} & = \beta_0 + \beta_1 x_{ij} + \epsilon_{0,ij} + \epsilon_{1,j} + \epsilon_{2,j} x_{ij} \\ & = (\beta_0 + \epsilon_{1,j}) + (\beta_1 + \epsilon_{2,j}) x_{ij}) \\ \epsilon_{0,ij} & \sim \textrm{Normal}(0,\sigma_0^2) \\ \{\epsilon_{1,j}, \epsilon_{2,j}\} & \sim \textrm{MVN}(0,\Sigma) \end{split} \]

  • variation in the effect of a treatment or covariate across groups
  • variation in intercept, slope allowed to be correlated

General definition

\[ \begin{split} \underbrace{Y_i}_{\text{response}} & \sim \overbrace{\text{Distr}}^{\substack{\text{conditional} \\ \text{distribution}}}(\underbrace{g^{-1}(\eta_i)}_{\substack{\text{inverse} \\ \text{link} \\ \text{function}}},\underbrace{\phi}_{\substack{\text{scale} \\ \text{parameter}}}) \\ \underbrace{\boldsymbol \eta}_{\substack{\text{linear} \\ \text{predictor}}} & = \underbrace{\boldsymbol X \boldsymbol \beta}_{\substack{\text{fixed} \\ \text{effects}}} + \underbrace{\boldsymbol Z \boldsymbol b}_{\substack{\text{random} \\ \text{effects}}} \\ \underbrace{\boldsymbol b}_{\substack{\text{conditional} \\ \text{modes}}} & \sim \text{MVN}(\boldsymbol 0, \underbrace{\Sigma(\boldsymbol \theta)}_{\substack{\text{variance-} \\ \text{covariance} \\ \text{matrix}}}) \end{split} \]

What are random effects?

A method for …

  • accounting for among-individual, within-block correlation
  • compromising between
    complete pooling (no among-block variance)
     and fixed effects (large among-block variance)
  • handling levels selected at random from a larger population
  • sharing information among levels (shrinkage estimation)
  • estimating variability among levels
  • allowing predictions for unmeasured levels

Random-effect myths

  • levels of random effects must always be sampled at random
  • a complete sample cannot be treated as a random effect
  • random effects are always a nuisance variable
  • nothing can be said about the predictions of a random effect
  • you should always use a random effect no matter how few levels you have

Reasons for random effects (inferential/philosophical)

  • don't want to
    • test hypotheses about differences between responses at particular levels of the grouping variable;
  • do want to
    • quantify variation among groups
    • make predictions about unobserved groups
  • have levels that are randomly sampled from/representative of a larger population (exchangeable)

Reasons for random effects (practical)

  • want to combine information across groups
  • have variation in information per level (number of samples or noisiness);
  • have a categorical predictor that is a nuisance variable (i.e., it is not of direct interest, but should be controlled for).
  • have more than 5-6 groups, or regularizing/using priors (otherwise, use fixed)

See also Crawley (2002); Gelman (2005)

Avoiding MM

  • average: for nested LMMs (when uninterested in within-group variation) (Murtaugh, 2007)
  • use fixed effects (or two-stage models) instead of random effects when
    • many samples per block (shrinkage unimportant)
    • few blocks (little advantage; bad variance estimates)

Estimation

Overview

Maximum likelihood estimation

  • Best fit is a compromise between two components
    (consistency of data with fixed effects and conditional modes; consistency of random effect with RE distribution)
    • Goodness-of-fit integrates over conditional modes

Shrinkage: Arabidopsis conditional modes

Shrinkage in a random-slopes model

From Christophe Lalanne, see here:

Methods

Estimation methods

  • deterministic
    • various approximate integrals (Breslow, 2004)
    • penalized quasi-likelihood, Laplace, Gauss-Hermite quadrature, … (Biswas, 2015);
      best methods needed for large variance, small clusters
    • flexibility and speed vs. accuracy
  • stochastic
  • stochastic (Monte Carlo): frequentist and Bayesian
    • (Booth & Hobert, 1999; Ponciano et al., 2009; Sung & Geyer, 2007)
    • usually slower but flexible and accurate

Estimation: Culcita (McKeon et al., 2012)

Inference

Wald tests

  • typical results of summary()
  • exact for ANOVA, regression:
    approximation for GLM(M)s
  • fast
  • approximation is sometimes awful (Hauck-Donner effect)

Likelihood ratio tests

  • better than Wald, but still have two problems:
  • “denominator degrees of freedom”
    (when estimating scale)
    • finite-size issues for GLM(M)s: Bartlett corrections
    • Kenward-Roger correction? (Stroup, 2014)
  • Profile confidence intervals: expensive/fragile

finite-size p-values??

  • hope that min grps > 42
  • guess denominator DF from classic design (R code)
  • conservative: take minimum number of groups - 1
  • Satterthwaite/Kenward-Roger (lmerTest, LMMs only)
  • parametric bootstrap (pbkrtest)

Parametric bootstrapping

  • fit null model to data
  • simulate “data” from null model
  • fit null and working model, compute likelihood difference
  • repeat to estimate null distribution
  • should be OK but ??? not well tested
    (assumes estimated parameters are "sufficiently" good)

Bayesian inference

  • If we have a good sample from the posterior distribution (Markov chains have converged etc. etc.) we get most of the inferences we want for free by summarizing the marginal posteriors
  • post hoc Bayesian methods: use deterministic/frequentist methods to find the maximum, then sample around it

Challenges & open questions

On beyond lme4

  • glmmTMB: zero-inflated and other distributions
  • brms,rstanarm: interfaces to Stan
  • INLA: spatial and temporal correlations
  • rethinking package

Toolboxes

  • JAGS (R: rjags, r2jags)
  • TensorFlow (R: greta)
  • NIMBLE (R: nimble package)
  • Stan (R: rstan)
  • TMB (R: TMB)

On beyond R

  • Julia: MixedModels.jl package
  • SAS: PROC MIXED, NLMIXED
  • AS-REML
  • Stata (GLLAMM, xtmelogit)
  • HLM, MLWiN

image

Challenges

  • Small clusters: need AGQ/MCMC
  • Small numbers of clusters: need finite-size corrections (KR/PB/MCMC)
  • Small data sets: issues with singular fits
    (Barr et al., 2013) vs. (Bates et al., 2015)
  • Big data: speed!
  • Model diagnosis
  • Confidence intervals accounting for uncertainty in variances

The need for speed (Brooks et al., 2017)

more speed

Spatial and temporal correlations

  • Sometimes blocking takes care of non-independence …
  • but sometimes there is temporal or spatial correlation within blocks
  • … also phylogenetic … (Ives & Zhu, 2006)
  • "G-side" vs. "R-side" effects
  • tricky to implement for GLMMs, but new possibilities on the horizon (Rousset & Ferdy, 2014; Rue et al., 2009); https://github.com/stevencarlislewalker/lme4ord

Next steps

  • Complex random effects:
    regularization, model selection, penalized methods (lasso/fence)
  • Flexible correlation and variance structures
  • Flexible/nonparametric random effects distributions
  • hybrid & improved MCMC methods
  • Reliable assessment of out-of-sample performance

resources

references

Banta, J. A., Stevens, M. H. H., & Pigliucci, M. (2010). A comprehensive test of the ’limiting resources’ framework applied to plant tolerance to apical meristem damage. Oikos, 119(2), 359–369. http://doi.org/10.1111/j.1600-0706.2009.17726.x

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. http://doi.org/10.1016/j.jml.2012.11.001

Bates, D., Kliegl, R., Vasishth, S., & Baayen, H. (2015). Parsimonious Mixed Models. ArXiv:1506.04967 [Stat]. Retrieved from http://arxiv.org/abs/1506.04967

Biswas, K. (2015). Performances of different estimation methods for generalized linear mixed models (Master’s thesis). McMaster University. Retrieved from https://macsphere.mcmaster.ca/bitstream/11375/17272/2/M.Sc_Thesis_final_Keya_Biswas.pdf

Bolker, B. M. (2015). Linear and generalized linear mixed models. In G. A. Fox, S. Negrete-Yankelevich, & V. J. Sosa (Eds.), Ecological statistics: Contemporary theory and application. Oxford University Press.

Booth, J. G., & Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. Journal of the Royal Statistical Society. Series B, 61(1), 265–285. http://doi.org/10.1111/1467-9868.00176

Breslow, N. E. (2004). Whither PQL? In D. Y. Lin & P. J. Heagerty (Eds.), Proceedings of the second Seattle symposium in biostatistics: Analysis of correlated data (pp. 1–22). Springer.

Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., … Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. R Journal, 9(2), 378–400. Retrieved from https://journal.r-project.org/archive/2017/RJ-2017-066/RJ-2017-066.pdf

Crawley, M. J. (2002). Statistical computing: An introduction to data analysis using S-PLUS. John Wiley & Sons.

Gelman, A. (2005). Analysis of variance: Why it is more important than ever. Annals of Statistics, 33(1), 1–53. http://doi.org/doi:10.1214/009053604000001048

Ives, A. R., & Zhu, J. (2006). Statistics for correlated data: Phylogenies, space, and time. Ecological Applications, 16(1), 20–32. Retrieved from http://www.esajournals.org/doi/pdf/10.1890/04-0702

McKeon, C. S., Stier, A., McIlroy, S., & Bolker, B. (2012). Multiple defender effects: Synergistic coral defense by mutualist crustaceans. Oecologia, 169(4), 1095–1103. http://doi.org/10.1007/s00442-012-2275-2

Murtaugh, P. A. (2007). Simplicity and complexity in ecological data analysis. Ecology, 88(1), 56–62. Retrieved from http://www.esajournals.org/doi/abs/10.1890/0012-9658%282007%2988%5B56%3ASACIED%5D2.0.CO%3B2

Ponciano, J. M., Taper, M. L., Dennis, B., & Lele, S. R. (2009). Hierarchical models in ecology: Confidence intervals, hypothesis testing, and model selection using data cloning. Ecology, 90(2), 356–362. Retrieved from http://www.jstor.org/stable/27650990

Rousset, F., & Ferdy, J.-B. (2014). Testing environmental and genetic effects in the presence of spatial autocorrelation. Ecography, no–no. http://doi.org/10.1111/ecog.00566

Rue, H., Martino, S., & Chopin, N. (2009). Gaussian models using integrated nested Laplace approximations (with discussion). Journal of the Royal Statistical Society, Series B, 71(2), 319–392.

Stroup, W. W. (2014). Rethinking the analysis of non-normal data in plant and soil science. Agronomy Journal, 106, 1–17. http://doi.org/10.2134/agronj2013.0342

Sung, Y. J., & Geyer, C. J. (2007). Monte Carlo likelihood inference for missing data models. The Annals of Statistics, 35(3), 990–1011. http://doi.org/10.1214/009053606000001389