Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

Model specification

Distribution/family

going to assume Gaussian (Normal) here
may need to transform
we care about the conditional distribution, not the marginal
Box-Cox not implemented: could try MASS::boxcox() on residuals
brute-force: if it makes sense or if residuals look bad, try log/sqrt?
bias $\gg$ heteroscedasticity/outliers $\gg$ Normality

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Look at your data!

use table() to confirm how many observations per level of factor
- which factors covary?
- balanced?
plot the data

Random effects: reminder (Bolker 2015)

don’t want to test differences across specific groups
lots of groups (>5), few observations per group, unbalanced
exchangeable groups

Formulas

random effects specified with | as (a|g1) + (b|g2) + ...
right-hand side is the grouping variable (always categorical, usually a factor)
left-hand side is the varying term (most often 1)s
terms separated by + are independent

equation	formula
$β_0 + β_{1}X_{i} + e_{si}$	n/a (Not a mixed-effects model)
$(β_0 + b_{S,0s}) + β_{1}X_i + e_{si}$	`∼ X + (1∣Subject)`
$(β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}$	`~ X + (1 + X∣Subject)`
$(β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}$	`∼ X + (1 + X∣Subject) + (1∣Item)`
As above, but $S_{0s}$, $S_{1s}$ independent	`∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item)`
$(β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}$	`∼ X + (1∣Subject) + (1∣Item)`
$(β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}$	`∼ X + (0 + X∣Subject) + (1∣Item)`

Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius). Subscripts $\{S,I\}$ refer to Subject vs Item effects. Lower-case $\{s,i\}$ indicate particular subjects/items. $\{0,1\}$ refer to intercept vs slope effects.

Nested vs crossed designs

Nested: sub-unit IDs only measured within a single larger unit. e.g.: Plot1 in Block1 independent of Plot1 in Block2

Crossed: sub-unit IDs can be measured in multiple larger units. e.g. year, site

Unique coding: removes ambiguity

Robert Long, Cross Validated

Formulas, interactions, nesting etc.

a*b = main effects plus interaction, a:b = interaction only, a/b = a + a:b

Nested: (1|f/g) equivalent to (1|f) + (1|f:g). e.g. subplots vary within plots (but “subplot 1 of every plot” isn’t meaningful)
Crossed: (1|f) + (1|g). e.g. years vary, and plots vary independently
Crossed+: (1|f) + (1|g) + (1|f:g). e.g. years vary, and plots vary independently, and plots also vary within years (for LMMs, assumes >1 observation per plot/year combination). ((1|f*g) should be allowed but …)

Don’t need explicit nesting if your sub-groups are uniquely labeled (i.e. A1, A2, …, B1, B2, …)

Factors varying across groups

These can be difficult, because they generate large variance-covariance matrices. E.g for a four-level factor, R parameterizes the model as $\beta_0$ (intercept), $\beta_1$ (level 2 - level 1), $\beta_2$ (level 3 - level 1), $\beta_3$ (level 4 - level 1). This gives us

\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cccc} \sigma^2_{\{b|1\}} & . & . & . \\ \sigma_{\{b|1\},\{b|a_{21}\}} & \sigma^2_{\{b|a_{21}\}} & . & . \\ \sigma_{\{b|1\}, \{b|a_{31}\}} & \sigma_{\{b|a_{21}\},\{b|a_{31}\}} & \sigma^2_{\{b|a_{31}\}} & . \\ \sigma_{\{b|1\} ,\{b|a_{41}\}} & \sigma_{\{b|a_{21}\},\{b|a_{41}\}} & \sigma_{\{b|a_{31}\},\{b|a_{41}\}} & \sigma^2_{\{b|a_{41}\}} \end{array} \right] \right) \]

What is the maximal model?

Which effects vary within which groups?
If effects don’t vary within groups, then we can’t estimate among-group variation in the effect
- convenient
- maybe less powerful (among-group variation is lumped into residual variation)
e.g. female rats exposed to different doses of radiation, multiple pups per mother, multiple measurements per pup (labeled by time). Maximal model … ?

Maximal model often won’t work

e.g.

Culcita (coral-reef) example: randomized-block design, so each treatment (none/crabs/shrimp/both) is repeated in every block; thus (treat|block) is maximal
CBPP data: each herd is measured in every period, so in principle we could use (period|herd), not just (1|herd)

Random-slopes models: what does `(x|g)` really do?

equivalent to (1+x|g)
both intercept (baseline) and slope vary across groups
estimates bivariate zero-centered distribution:

\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cc} \sigma^2_{\textrm{int}} & \sigma_{\textrm{int},\textrm{slope}} \\ \sigma_{\textrm{int},\textrm{slope}} & \sigma^2_{\textrm{slope}} \end{array} \right] \right) \]

maximal model can get very complicated (e.g. (1+x+y+z+...|g): $n$ effects require $n(n+1)/2$ variances + covariances

Other FAQs

Can you have continuous variables as REs?
A: yes, as varying terms, but not as grouping variables
Can a variable be in the model as both FE and RE?
A: only in the special case where $x$ is numeric but discrete (e.g. year) and >1 observation per $x$ value;
FE describes overall trend, RE describes variation around the trend

What is a practical model?

Fits aren’t singular
singular = zero variances, +/- 1 correlations
More subtle for larger models:
all(abs(getME(x,"theta"))>1e-4)

Why are fits singular?

Essentially, because the observed among-group variation is less than the expected among-group variation (= $\sigma^2_{\mbox{among}} + \sigma^2_{\mbox{within}}/n$). More generally, because some dimension of the variance-covariance matrix has zero extent …

Simplified versions of models

(1|b/a) ([positive] compound symmetry) vs. (a|b): \[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{ccccc} \sigma^2_{\{b|1\}} & . & . & . & . \\ \sigma_{\{b|1\},\{b|a_{21}\}} & \sigma^2_{\{b|a_{21}\}} & . & . & . \\ \sigma_{\{b|1\}, \{b|a_{31}\}} & \sigma_{\{b|a_{21}\},\{b|a_{31}\}} & \sigma^2_{\{b|a_{31}\}} & . & . \\ \sigma_{\{b|1\} ,\{b|a_{41}\}} & \sigma_{\{b|a_{21}\},\{b|a_{41}\}} & \sigma_{\{b|a_{31}\},\{b|a_{41}\}} & \sigma^2_{\{b|a_{41}\}} & . \\ \sigma_{\{b|1\},\{b|a_{51}\}} & \sigma_{\{b|a_{21}\},\{b|a_{51}\}} & \sigma_{\{b|a_{31}\},\{b|a_{51}\}} & \sigma_{\{b|a_{41}\},\{b|a_{51}\}} & \sigma^2_{\{b|a_{51}\}} \end{array} \right] \right) \] (=$(n(n+1))/2 = (4\times 5)/2 = 10$ parameters) vs. \[ \left[ \begin{array}{ccccc} \sigma^2 & . & . & . & . \\ \rho \sigma^2 & \sigma^2 & . & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 \end{array} \right] \] where $\sigma^2 = \sigma^2_{\{b|1\}}+\sigma^2_{\{a:b|1\}}$, $\rho = \sigma^2_{\{b|1\}}/\sigma^2$ (=2 parameters; $\rho$ must be >0)
(1+x+y+z||b)
- independent terms
- expands to (1|b) + (0+x|b) + ...
- lme4 version only works properly for continuous predictors
- afex::mixed can do this
- independent model is no longer invariant to shifts/reparameterization
- $n$ instead of $n(n+1)/2$ parameters
RE “nested within” FE, e.g. if not enough groups at the top level; y ~ g1 + (1|g1:g2); more parameters but fixed rather than random

Convergence failures

convergence failures are common
what do they really mean? how to fix them? when can they be ignored?
approximate test that gradient=0 and curvature is correct
scale and center predictors; simplify model
use ?allFit to see whether different optimizers give sufficiently similar answers
- $fixef, etc.: are answers sufficiently similar?
- $llik: how similar is goodness-of-fit?

m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
aa <- allFit(m1)

## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]

ss <- summary(aa)
names(ss)

## [1] "which.OK" "msgs"     "fixef"    "llik"     "sdcor"    "theta"    "times"   
## [8] "feval"

ss$fixef

##                               (Intercept)     Days
## bobyqa                           251.4051 10.46729
## Nelder_Mead                      251.4051 10.46729
## nlminbwrap                       251.4051 10.46729
## optimx.L-BFGS-B                  251.4051 10.46729
## nloptwrap.NLOPT_LN_NELDERMEAD    251.4051 10.46729
## nloptwrap.NLOPT_LN_BOBYQA        251.4051 10.46729

ss$sdcor

##                               Subject.(Intercept) Subject.Days.(Intercept)
## bobyqa                                   24.74045                 5.922133
## Nelder_Mead                              24.74061                 5.922142
## nlminbwrap                               24.74047                 5.922132
## optimx.L-BFGS-B                          24.74051                 5.922188
## nloptwrap.NLOPT_LN_NELDERMEAD            24.74180                 5.922206
## nloptwrap.NLOPT_LN_BOBYQA                24.74066                 5.922138
##                               Subject.Days    sigma
## bobyqa                          0.06555133 25.59182
## Nelder_Mead                     0.06555186 25.59180
## nlminbwrap                      0.06555080 25.59182
## optimx.L-BFGS-B                 0.06555693 25.59179
## nloptwrap.NLOPT_LN_NELDERMEAD   0.06553081 25.59167
## nloptwrap.NLOPT_LN_BOBYQA       0.06555124 25.59180

ss$llik-min(ss$llik)

##                        bobyqa                   Nelder_Mead 
##                  3.059904e-08                  3.013236e-08 
##                    nlminbwrap               optimx.L-BFGS-B 
##                  3.059472e-08                  2.889999e-08 
## nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
##                  0.000000e+00                  2.986872e-08

allFit can be run in parallel

How to decide what model to use?

Most complex RE model that can reasonably be fitted
Lots of disagreement
- Barr et al. (2013): “keep it maximal”
- Bates et al. (2015), Matuschek et al. (2017): more initial parsimony
- ? use most complex non-singular model ?

PCA of RE variance-covariance matrix:

rePCA(m1)

## $Subject
## Standard deviations (1, .., p=2):
## [1] 0.9668680 0.2308798
## 
## Rotation (n x k) = (2 x 2):
##             [,1]        [,2]
## [1,] -0.99986158 -0.01663769
## [2,] -0.01663769  0.99986158
## 
## attr(,"class")
## [1] "prcomplist"

Restricted maximum likelihood (REML)

reduce bias in estimated variances and covariances
similar to dividing by $n-1$ instead of $n$ when computing sample variances
e.g. estimate of variance in paired data
- find difference of each pair, mean and variance of differences
works either by scaling out fixed effects (LMM) or by integrating over uncertainty of fixed parameters (GLMM/glmmTMB)
less important than people think
don’t compare REML-fitted models with different fixed effects
(lme4 will try to prevent this …)

Fit model

m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

Diagnostics: residual scale

Similar to generalized linear models: fitted vs. residual, scale-location, Q-Q

aa <- augment(m1)
## fitted vs resid
plot(m1,type=c("p","smooth"))
## scale-location
plot(m1,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
## q-q
lattice::qqmath(m1,id=0.05)

Or with augment and ggplot …

aa <- augment(m1)
## fitted vs resid
ggplot(aa,aes(.fitted,.resid))+geom_point()+geom_smooth()
## scale-location
ggplot(aa,aes(.fitted,sqrt(abs(.resid))))+geom_point()+geom_smooth()
## q-q
ggplot(aa)+stat_qq(aes(sample=.resid))

The broom/broom.mixed packages

tidy: get coefficient values
- effects="fixed"
- effects="ran_pars"
- effects="ran_vals"
- effects="ran_coefs"
glance: get model summaries
augment: get fitted, residuals, possibly predictions

Inference

Likelihood

probability of data given model
for mixed models, includes the integral over the random effects

Profiles and intervals

Wald approximation

most important:

small sample sizes
values near boundaries
- changing scales may help

more important for variance parameters than fixed-effect parameters

Finite-size corrections

e.g. Normal vs. $t$

Posterior predictive simulation

if there is some value you’re interested in that you can compute from your data (e.g. number of zero observations, or total NPP), you can simulate() it fro the fitted model:

Inference: fixed effects, Wald

parameter estimates: summary()
termwise tests: car::Anova, afex::anova
contrasts/effects/post-hoc tests: emmeans, effects packages

car::Anova(m1)

Degrees of freedom

level-counting: R/calcDenDF.R
lmerTest/afex; Satterthwaite, Kenward-Roger

library(lmerTest)
summary(as_lmerModLmerTest(m1))

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

summary(as_lmerModLmerTest(m1),ddf="Kenward-Roger")

## Linear mixed model fit by REML. t-tests use Kenward-Roger's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

Q: Why not use lmerTest all the time?
A: it can make it a little harder to diagnose fitting problems

Inference: fixed effects, likelihood ratio test

individual parameters: profile(), confint()
fit pairwise models and use anova() to compare
drop1, afex::anova()

confint(m1)

## Computing profile confidence intervals ...

##                   2.5 %      97.5 %
## .sig01       14.3814732  37.7159917
## .sig02       -0.4815008   0.6849863
## .sig03        3.8011643   8.7533851
## .sigma       22.8982669  28.8579965
## (Intercept) 237.6806955 265.1295147
## Days          7.3586533  13.5759188

What took so long?

Inference: parametric bootstrap

Very slow!

pbkrtest
confint(.,method="boot")

Inference: random effects

Wald is probably very bad
profile, anova()
parametric bootstrap
boundary problems

Inference: CIs on predictions etc.

In general it’s not as easy as one might like to get confidence intervals on predictions. If we ignore the effects of uncertainty in the random effects, then it can be done using the formula $\sigma^2 = \textrm{Diag}\left(X V X^\top\right)$

mm <- model.matrix(terms(fitted_model),newdat)
se_pred <- sqrt(diag(mm %*% tcrossprod(vcov(fitted_model),mm)))

emmeans etc.

data("Oats",package="nlme")
m_oats <- lmer(yield~nitro*Variety+(1|Block),data=Oats)
(e1 <- emmeans(m_oats,~nitro|Variety))

## Variety = Golden Rain:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3  104.5 7.12 6.61     87.5      122
## 
## Variety = Marvellous:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3  109.8 7.12 6.61     92.7      127
## 
## Variety = Victory:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3   97.6 7.12 6.61     80.6      115
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

plot(e1)

e2 <- emmeans(m_oats,~Variety)

## NOTE: Results may be misleading due to involvement in interactions

##
contrast(e2,"pairwise")

##  contrast                 estimate   SE df t.ratio p.value
##  Golden Rain - Marvellous    -5.29 4.47 61  -1.184  0.4671
##  Golden Rain - Victory        6.88 4.47 61   1.538  0.2804
##  Marvellous - Victory        12.17 4.47 61   2.722  0.0226
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

also see the effects package.

car::Anova(m_oats,test="F")

## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
## 
## Response: yield
##                     F Df Df.res    Pr(>F)    
## nitro         81.5155  1     61 7.648e-13 ***
## Variety        3.7268  2     61   0.02972 *  
## nitro:Variety  0.3512  2     61   0.70524    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

References

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.

Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.

Linear mixed models in R: nitty-gritty

Ben Bolker

24 June 2023

Model specification

Distribution/family

Look at your data!

Random effects: reminder (Bolker 2015)

Formulas

Nested vs crossed designs

Formulas, interactions, nesting etc.

Factors varying across groups

What is the maximal model?

Random-slopes models: what does `(x|g)` really do?

Other FAQs

What is a practical model?

Why are fits singular?

Simplified versions of models

Convergence failures

How to decide what model to use?

Restricted maximum likelihood (REML)

Fit model

Diagnostics: residual scale

The broom/broom.mixed packages

Inference

Likelihood

Profiles and intervals

Wald approximation

Finite-size corrections

Posterior predictive simulation

Inference: fixed effects, Wald

Degrees of freedom

Inference: fixed effects, likelihood ratio test

Inference: parametric bootstrap

Inference: random effects

Inference: CIs on predictions etc.

emmeans etc.

References

equation	formula
\(β_0 + β_{1}X_{i} + e_{si}\)	n/a (Not a mixed-effects model)
\((β_0 + b_{S,0s}) + β_{1}X_i + e_{si}\)	`∼ X + (1∣Subject)`
\((β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}\)	`~ X + (1 + X∣Subject)`
\((β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}\)	`∼ X + (1 + X∣Subject) + (1∣Item)`
As above, but \(S_{0s}\), \(S_{1s}\) independent	`∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item)`
\((β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}\)	`∼ X + (1∣Subject) + (1∣Item)`
\((β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}\)	`∼ X + (0 + X∣Subject) + (1∣Item)`

Linear mixed models in R: nitty-gritty

Ben Bolker

24 June 2023

Model specification

Distribution/family

Look at your data!

Random effects: reminder (Bolker 2015)

Formulas

Nested vs crossed designs

Formulas, interactions, nesting etc.

Factors varying across groups

What is the maximal model?

Random-slopes models: what does (x|g) really do?

Other FAQs

What is a practical model?

Why are fits singular?

Simplified versions of models

Convergence failures

How to decide what model to use?

Restricted maximum likelihood (REML)

Fit model

Diagnostics: residual scale

The broom/broom.mixed packages

Inference

Likelihood

Profiles and intervals

Wald approximation

Finite-size corrections

Posterior predictive simulation

Inference: fixed effects, Wald

Degrees of freedom

Inference: fixed effects, likelihood ratio test

Inference: parametric bootstrap

Inference: random effects

Inference: CIs on predictions etc.

emmeans etc.

References

Random-slopes models: what does `(x|g)` really do?