cc Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

Model specification

Distribution/family

  • going to assume Gaussian (Normal) here
  • may need to transform
  • we care about the conditional distribution, not the marginal
  • Box-Cox not implemented: could try MASS::boxcox() on residuals
  • brute-force: if it makes sense or if residuals look bad, try log/sqrt?
  • bias \(\gg\) heteroscedasticity/outliers \(\gg\) Normality

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Look at your data!

  • use table() to confirm how many observations per level of factor
    • which factors covary?
    • balanced?
  • plot the data

Random effects: reminder (Bolker 2015)

  • don’t want to test differences across specific groups
  • lots of groups (>5), few observations per group, unbalanced
  • exchangeable groups

Formulas

  • random effects specified with | as (a|g1) + (b|g2) + ...
  • right-hand side is the grouping variable (always categorical, usually a factor)
  • left-hand side is the varying term (most often 1)s
  • terms separated by + are independent
equation formula
\(β_0 + β_{1}X_{i} + e_{si}\) n/a (Not a mixed-effects model)
\((β_0 + b_{S,0s}) + β_{1}X_i + e_{si}\) ∼ X + (1∣Subject)
\((β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) ~ X + (1 + X∣Subject)
\((β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) ∼ X + (1 + X∣Subject) + (1∣Item)
As above, but \(S_{0s}\), \(S_{1s}\) independent ∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item)
\((β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}\) ∼ X + (1∣Subject) + (1∣Item)
\((β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}\) ∼ X + (0 + X∣Subject) + (1∣Item)

Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius). Subscripts \(\{S,I\}\) refer to Subject vs Item effects. Lower-case \(\{s,i\}\) indicate particular subjects/items. \(\{0,1\}\) refer to intercept vs slope effects.

Nested vs crossed designs

Nested: sub-unit IDs only measured within a single larger unit. e.g.: Plot1 in Block1 independent of Plot1 in Block2

Crossed: sub-unit IDs can be measured in multiple larger units. e.g. year, site

Unique coding: removes ambiguity

Robert Long, Cross Validated

Formulas, interactions, nesting etc.

a*b = main effects plus interaction, a:b = interaction only, a/b = a + a:b

  • Nested: (1|f/g) equivalent to (1|f) + (1|f:g). e.g. subplots vary within plots (but “subplot 1 of every plot” isn’t meaningful)
  • Crossed: (1|f) + (1|g). e.g. years vary, and plots vary independently
  • Crossed+: (1|f) + (1|g) + (1|f:g). e.g. years vary, and plots vary independently, and plots also vary within years (for LMMs, assumes >1 observation per plot/year combination). ((1|f*g) should be allowed but …)

Don’t need explicit nesting if your sub-groups are uniquely labeled (i.e. A1, A2, …, B1, B2, …)

Factors varying across groups

These can be difficult, because they generate large variance-covariance matrices. E.g for a four-level factor, R parameterizes the model as \(\beta_0\) (intercept), \(\beta_1\) (level 2 - level 1), \(\beta_2\) (level 3 - level 1), \(\beta_3\) (level 4 - level 1). This gives us

\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cccc} \sigma^2_{\{b|1\}} & . & . & . \\ \sigma_{\{b|1\},\{b|a_{21}\}} & \sigma^2_{\{b|a_{21}\}} & . & . \\ \sigma_{\{b|1\}, \{b|a_{31}\}} & \sigma_{\{b|a_{21}\},\{b|a_{31}\}} & \sigma^2_{\{b|a_{31}\}} & . \\ \sigma_{\{b|1\} ,\{b|a_{41}\}} & \sigma_{\{b|a_{21}\},\{b|a_{41}\}} & \sigma_{\{b|a_{31}\},\{b|a_{41}\}} & \sigma^2_{\{b|a_{41}\}} \end{array} \right] \right) \]

What is the maximal model?

  • Which effects vary within which groups?
  • If effects don’t vary within groups, then we can’t estimate among-group variation in the effect
    • convenient
    • maybe less powerful (among-group variation is lumped into residual variation)
  • e.g. female rats exposed to different doses of radiation, multiple pups per mother, multiple measurements per pup (labeled by time). Maximal model … ?

Maximal model often won’t work

e.g.

  • Culcita (coral-reef) example: randomized-block design, so each treatment (none/crabs/shrimp/both) is repeated in every block; thus (treat|block) is maximal
  • CBPP data: each herd is measured in every period, so in principle we could use (period|herd), not just (1|herd)

Random-slopes models: what does (x|g) really do?

  • equivalent to (1+x|g)
  • both intercept (baseline) and slope vary across groups
  • estimates bivariate zero-centered distribution:

\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cc} \sigma^2_{\textrm{int}} & \sigma_{\textrm{int},\textrm{slope}} \\ \sigma_{\textrm{int},\textrm{slope}} & \sigma^2_{\textrm{slope}} \end{array} \right] \right) \]


  • maximal model can get very complicated (e.g. (1+x+y+z+...|g): \(n\) effects require \(n(n+1)/2\) variances + covariances

Other FAQs

  • Can you have continuous variables as REs?
    A: yes, as varying terms, but not as grouping variables
  • Can a variable be in the model as both FE and RE?
    A: only in the special case where \(x\) is numeric but discrete (e.g. year) and >1 observation per \(x\) value;
    FE describes overall trend, RE describes variation around the trend

What is a practical model?

  • Fits aren’t singular
  • singular = zero variances, +/- 1 correlations
  • More subtle for larger models:
    all(abs(getME(x,"theta"))>1e-4)

Why are fits singular?

Essentially, because the observed among-group variation is less than the expected among-group variation (= \(\sigma^2_{\mbox{among}} + \sigma^2_{\mbox{within}}/n\)). More generally, because some dimension of the variance-covariance matrix has zero extent …

Simplified versions of models

  • (1|b/a) ([positive] compound symmetry) vs. (a|b): \[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{ccccc} \sigma^2_{\{b|1\}} & . & . & . & . \\ \sigma_{\{b|1\},\{b|a_{21}\}} & \sigma^2_{\{b|a_{21}\}} & . & . & . \\ \sigma_{\{b|1\}, \{b|a_{31}\}} & \sigma_{\{b|a_{21}\},\{b|a_{31}\}} & \sigma^2_{\{b|a_{31}\}} & . & . \\ \sigma_{\{b|1\} ,\{b|a_{41}\}} & \sigma_{\{b|a_{21}\},\{b|a_{41}\}} & \sigma_{\{b|a_{31}\},\{b|a_{41}\}} & \sigma^2_{\{b|a_{41}\}} & . \\ \sigma_{\{b|1\},\{b|a_{51}\}} & \sigma_{\{b|a_{21}\},\{b|a_{51}\}} & \sigma_{\{b|a_{31}\},\{b|a_{51}\}} & \sigma_{\{b|a_{41}\},\{b|a_{51}\}} & \sigma^2_{\{b|a_{51}\}} \end{array} \right] \right) \] (=\((n(n+1))/2 = (4\times 5)/2 = 10\) parameters) vs. \[ \left[ \begin{array}{ccccc} \sigma^2 & . & . & . & . \\ \rho \sigma^2 & \sigma^2 & . & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 \end{array} \right] \] where \(\sigma^2 = \sigma^2_{\{b|1\}}+\sigma^2_{\{a:b|1\}}\), \(\rho = \sigma^2_{\{b|1\}}/\sigma^2\) (=2 parameters; \(\rho\) must be >0)

  • (1+x+y+z||b)

    • independent terms
    • expands to (1|b) + (0+x|b) + ...
    • lme4 version only works properly for continuous predictors
    • afex::mixed can do this
    • independent model is no longer invariant to shifts/reparameterization
    • \(n\) instead of \(n(n+1)/2\) parameters
  • RE “nested within” FE, e.g. if not enough groups at the top level; y ~ g1 + (1|g1:g2); more parameters but fixed rather than random

Convergence failures

  • convergence failures are common
  • what do they really mean? how to fix them? when can they be ignored?
  • approximate test that gradient=0 and curvature is correct
  • scale and center predictors; simplify model
  • use ?allFit to see whether different optimizers give sufficiently similar answers
    • $fixef, etc.: are answers sufficiently similar?
    • $llik: how similar is goodness-of-fit?
m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
aa <- allFit(m1)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]
ss <- summary(aa)
names(ss)
## [1] "which.OK" "msgs"     "fixef"    "llik"     "sdcor"    "theta"    "times"   
## [8] "feval"
ss$fixef
##                               (Intercept)     Days
## bobyqa                           251.4051 10.46729
## Nelder_Mead                      251.4051 10.46729
## nlminbwrap                       251.4051 10.46729
## optimx.L-BFGS-B                  251.4051 10.46729
## nloptwrap.NLOPT_LN_NELDERMEAD    251.4051 10.46729
## nloptwrap.NLOPT_LN_BOBYQA        251.4051 10.46729
ss$sdcor
##                               Subject.(Intercept) Subject.Days.(Intercept)
## bobyqa                                   24.74045                 5.922133
## Nelder_Mead                              24.74061                 5.922142
## nlminbwrap                               24.74047                 5.922132
## optimx.L-BFGS-B                          24.74051                 5.922188
## nloptwrap.NLOPT_LN_NELDERMEAD            24.74180                 5.922206
## nloptwrap.NLOPT_LN_BOBYQA                24.74066                 5.922138
##                               Subject.Days    sigma
## bobyqa                          0.06555133 25.59182
## Nelder_Mead                     0.06555186 25.59180
## nlminbwrap                      0.06555080 25.59182
## optimx.L-BFGS-B                 0.06555693 25.59179
## nloptwrap.NLOPT_LN_NELDERMEAD   0.06553081 25.59167
## nloptwrap.NLOPT_LN_BOBYQA       0.06555124 25.59180
ss$llik-min(ss$llik)
##                        bobyqa                   Nelder_Mead 
##                  3.059904e-08                  3.013236e-08 
##                    nlminbwrap               optimx.L-BFGS-B 
##                  3.059472e-08                  2.889999e-08 
## nloptwrap.NLOPT_LN_NELDERMEAD     nloptwrap.NLOPT_LN_BOBYQA 
##                  0.000000e+00                  2.986872e-08
  • allFit can be run in parallel

How to decide what model to use?

  • Most complex RE model that can reasonably be fitted
  • Lots of disagreement
    • Barr et al. (2013): “keep it maximal”
    • Bates et al. (2015), Matuschek et al. (2017): more initial parsimony
    • ? use most complex non-singular model ?

PCA of RE variance-covariance matrix:

rePCA(m1)
## $Subject
## Standard deviations (1, .., p=2):
## [1] 0.9668680 0.2308798
## 
## Rotation (n x k) = (2 x 2):
##             [,1]        [,2]
## [1,] -0.99986158 -0.01663769
## [2,] -0.01663769  0.99986158
## 
## attr(,"class")
## [1] "prcomplist"

Restricted maximum likelihood (REML)

  • reduce bias in estimated variances and covariances
  • similar to dividing by \(n-1\) instead of \(n\) when computing sample variances
  • e.g. estimate of variance in paired data
    • find difference of each pair, mean and variance of differences
  • works either by scaling out fixed effects (LMM) or by integrating over uncertainty of fixed parameters (GLMM/glmmTMB)
  • less important than people think
  • don’t compare REML-fitted models with different fixed effects
    (lme4 will try to prevent this …)

Fit model

m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)

Diagnostics: residual scale

Similar to generalized linear models: fitted vs. residual, scale-location, Q-Q

aa <- augment(m1)
## fitted vs resid
plot(m1,type=c("p","smooth"))
## scale-location
plot(m1,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
## q-q
lattice::qqmath(m1,id=0.05)

Or with augment and ggplot …

aa <- augment(m1)
## fitted vs resid
ggplot(aa,aes(.fitted,.resid))+geom_point()+geom_smooth()
## scale-location
ggplot(aa,aes(.fitted,sqrt(abs(.resid))))+geom_point()+geom_smooth()
## q-q
ggplot(aa)+stat_qq(aes(sample=.resid))

The broom/broom.mixed packages

Inference

Likelihood

  • probability of data given model
  • for mixed models, includes the integral over the random effects

Profiles and intervals

Wald approximation

most important:

  • small sample sizes
  • values near boundaries
    • changing scales may help

more important for variance parameters than fixed-effect parameters

Finite-size corrections

e.g. Normal vs. \(t\)

Posterior predictive simulation

  • if there is some value you’re interested in that you can compute from your data (e.g. number of zero observations, or total NPP), you can simulate() it fro the fitted model:

Inference: fixed effects, Wald

  • parameter estimates: summary()
  • termwise tests: car::Anova, afex::anova
  • contrasts/effects/post-hoc tests: emmeans, effects packages
car::Anova(m1)

Degrees of freedom

  • level-counting: R/calcDenDF.R
  • lmerTest/afex; Satterthwaite, Kenward-Roger
library(lmerTest)
summary(as_lmerModLmerTest(m1))
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138
summary(as_lmerModLmerTest(m1),ddf="Kenward-Roger")
## Linear mixed model fit by REML. t-tests use Kenward-Roger's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

Q: Why not use lmerTest all the time?
A: it can make it a little harder to diagnose fitting problems

Inference: fixed effects, likelihood ratio test

  • individual parameters: profile(), confint()
  • fit pairwise models and use anova() to compare
  • drop1, afex::anova()
confint(m1)
## Computing profile confidence intervals ...
##                   2.5 %      97.5 %
## .sig01       14.3814732  37.7159917
## .sig02       -0.4815008   0.6849863
## .sig03        3.8011643   8.7533851
## .sigma       22.8982669  28.8579965
## (Intercept) 237.6806955 265.1295147
## Days          7.3586533  13.5759188

What took so long?

Inference: parametric bootstrap

Very slow!

  • pbkrtest
  • confint(.,method="boot")

Inference: random effects

  • Wald is probably very bad
  • profile, anova()
  • parametric bootstrap
  • boundary problems

Inference: CIs on predictions etc.

In general it’s not as easy as one might like to get confidence intervals on predictions. If we ignore the effects of uncertainty in the random effects, then it can be done using the formula \(\sigma^2 = \textrm{Diag}\left(X V X^\top\right)\)

mm <- model.matrix(terms(fitted_model),newdat)
se_pred <- sqrt(diag(mm %*% tcrossprod(vcov(fitted_model),mm)))

emmeans etc.

data("Oats",package="nlme")
m_oats <- lmer(yield~nitro*Variety+(1|Block),data=Oats)
(e1 <- emmeans(m_oats,~nitro|Variety))
## Variety = Golden Rain:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3  104.5 7.12 6.61     87.5      122
## 
## Variety = Marvellous:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3  109.8 7.12 6.61     92.7      127
## 
## Variety = Victory:
##  nitro emmean   SE   df lower.CL upper.CL
##    0.3   97.6 7.12 6.61     80.6      115
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95
plot(e1)

e2 <- emmeans(m_oats,~Variety)
## NOTE: Results may be misleading due to involvement in interactions
##
contrast(e2,"pairwise")
##  contrast                 estimate   SE df t.ratio p.value
##  Golden Rain - Marvellous    -5.29 4.47 61  -1.184  0.4671
##  Golden Rain - Victory        6.88 4.47 61   1.538  0.2804
##  Marvellous - Victory        12.17 4.47 61   2.722  0.0226
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

also see the effects package.

car::Anova(m_oats,test="F")
## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
## 
## Response: yield
##                     F Df Df.res    Pr(>F)    
## nitro         81.5155  1     61 7.648e-13 ***
## Variety        3.7268  2     61   0.02972 *  
## nitro:Variety  0.3512  2     61   0.70524    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

References

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.

Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.