Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.
MASS::boxcox()
on residuals## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
table()
to confirm how many observations per level of factor
|
as (a|g1) + (b|g2) + ...
+
are independentequation | formula |
---|---|
\(β_0 + β_{1}X_{i} + e_{si}\) | n/a (Not a mixed-effects model) |
\((β_0 + b_{S,0s}) + β_{1}X_i + e_{si}\) | ∼ X + (1∣Subject) |
\((β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) | ~ X + (1 + X∣Subject) |
\((β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}\) | ∼ X + (1 + X∣Subject) + (1∣Item) |
As above, but \(S_{0s}\), \(S_{1s}\) independent | ∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item) |
\((β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}\) | ∼ X + (1∣Subject) + (1∣Item) |
\((β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}\) | ∼ X + (0 + X∣Subject) + (1∣Item) |
Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius). Subscripts \(\{S,I\}\) refer to Subject vs Item effects. Lower-case \(\{s,i\}\) indicate particular subjects/items. \(\{0,1\}\) refer to intercept vs slope effects.
Nested: sub-unit IDs only measured within a single larger unit. e.g.: Plot1 in Block1 independent of Plot1 in Block2
Crossed: sub-unit IDs can be measured in multiple larger units. e.g. year, site
Unique coding: removes ambiguity
Robert Long, Cross Validated
a*b
= main effects plus interaction, a:b
= interaction only, a/b
= a
+ a:b
(1|f/g)
equivalent to (1|f) + (1|f:g)
. e.g. subplots vary within plots (but “subplot 1 of every plot” isn’t meaningful)(1|f) + (1|g)
. e.g. years vary, and plots vary independently(1|f) + (1|g) + (1|f:g)
. e.g. years vary, and plots vary independently, and plots also vary within years (for LMMs, assumes >1 observation per plot/year combination). ((1|f*g)
should be allowed but …)Don’t need explicit nesting if your sub-groups are uniquely labeled (i.e. A1
, A2
, …, B1
, B2
, …)
These can be difficult, because they generate large variance-covariance matrices. E.g for a four-level factor, R parameterizes the model as \(\beta_0\) (intercept), \(\beta_1\) (level 2 - level 1), \(\beta_2\) (level 3 - level 1), \(\beta_3\) (level 4 - level 1). This gives us
\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cccc} \sigma^2_{\{b|1\}} & . & . & . \\ \sigma_{\{b|1\},\{b|a_{21}\}} & \sigma^2_{\{b|a_{21}\}} & . & . \\ \sigma_{\{b|1\}, \{b|a_{31}\}} & \sigma_{\{b|a_{21}\},\{b|a_{31}\}} & \sigma^2_{\{b|a_{31}\}} & . \\ \sigma_{\{b|1\} ,\{b|a_{41}\}} & \sigma_{\{b|a_{21}\},\{b|a_{41}\}} & \sigma_{\{b|a_{31}\},\{b|a_{41}\}} & \sigma^2_{\{b|a_{41}\}} \end{array} \right] \right) \]
Maximal model often won’t work
e.g.
(treat|block)
is maximal(period|herd)
, not just (1|herd)
(x|g)
really do?(1+x|g)
\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cc} \sigma^2_{\textrm{int}} & \sigma_{\textrm{int},\textrm{slope}} \\ \sigma_{\textrm{int},\textrm{slope}} & \sigma^2_{\textrm{slope}} \end{array} \right] \right) \]
(1+x+y+z+...|g)
: \(n\) effects require \(n(n+1)/2\) variances + covariancesall(abs(getME(x,"theta"))>1e-4)
Essentially, because the observed among-group variation is less than the expected among-group variation (= \(\sigma^2_{\mbox{among}} + \sigma^2_{\mbox{within}}/n\)). More generally, because some dimension of the variance-covariance matrix has zero extent …
(1|b/a)
([positive] compound symmetry) vs. (a|b)
: \[
(\textrm{intercept}, \textrm{slope}) =
\textrm{MVN}\left(\boldsymbol 0,
\left[
\begin{array}{ccccc}
\sigma^2_{\{b|1\}} & . & . & . & . \\
\sigma_{\{b|1\},\{b|a_{21}\}} &
\sigma^2_{\{b|a_{21}\}} & . & . & . \\
\sigma_{\{b|1\}, \{b|a_{31}\}} &
\sigma_{\{b|a_{21}\},\{b|a_{31}\}} &
\sigma^2_{\{b|a_{31}\}} & . & . \\
\sigma_{\{b|1\} ,\{b|a_{41}\}} &
\sigma_{\{b|a_{21}\},\{b|a_{41}\}} &
\sigma_{\{b|a_{31}\},\{b|a_{41}\}} &
\sigma^2_{\{b|a_{41}\}} & . \\
\sigma_{\{b|1\},\{b|a_{51}\}} &
\sigma_{\{b|a_{21}\},\{b|a_{51}\}} &
\sigma_{\{b|a_{31}\},\{b|a_{51}\}} &
\sigma_{\{b|a_{41}\},\{b|a_{51}\}} &
\sigma^2_{\{b|a_{51}\}}
\end{array}
\right]
\right)
\] (=\((n(n+1))/2 = (4\times 5)/2 = 10\) parameters) vs. \[
\left[
\begin{array}{ccccc}
\sigma^2 & . & . & . & . \\
\rho \sigma^2 & \sigma^2 & . & . & . \\
\rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . & . \\
\rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . \\
\rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2
\end{array}
\right]
\] where \(\sigma^2 = \sigma^2_{\{b|1\}}+\sigma^2_{\{a:b|1\}}\), \(\rho = \sigma^2_{\{b|1\}}/\sigma^2\) (=2 parameters; \(\rho\) must be >0)
(1+x+y+z||b)
(1|b) + (0+x|b) + ...
lme4
version only works properly for continuous predictorsafex::mixed
can do thisRE “nested within” FE, e.g. if not enough groups at the top level; y ~ g1 + (1|g1:g2)
; more parameters but fixed rather than random
?allFit
to see whether different optimizers give sufficiently similar answers
$fixef
, etc.: are answers sufficiently similar?$llik
: how similar is goodness-of-fit?m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
aa <- allFit(m1)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]
ss <- summary(aa)
names(ss)
## [1] "which.OK" "msgs" "fixef" "llik" "sdcor" "theta" "times"
## [8] "feval"
ss$fixef
## (Intercept) Days
## bobyqa 251.4051 10.46729
## Nelder_Mead 251.4051 10.46729
## nlminbwrap 251.4051 10.46729
## optimx.L-BFGS-B 251.4051 10.46729
## nloptwrap.NLOPT_LN_NELDERMEAD 251.4051 10.46729
## nloptwrap.NLOPT_LN_BOBYQA 251.4051 10.46729
ss$sdcor
## Subject.(Intercept) Subject.Days.(Intercept)
## bobyqa 24.74045 5.922133
## Nelder_Mead 24.74061 5.922142
## nlminbwrap 24.74047 5.922132
## optimx.L-BFGS-B 24.74051 5.922188
## nloptwrap.NLOPT_LN_NELDERMEAD 24.74180 5.922206
## nloptwrap.NLOPT_LN_BOBYQA 24.74066 5.922138
## Subject.Days sigma
## bobyqa 0.06555133 25.59182
## Nelder_Mead 0.06555186 25.59180
## nlminbwrap 0.06555080 25.59182
## optimx.L-BFGS-B 0.06555693 25.59179
## nloptwrap.NLOPT_LN_NELDERMEAD 0.06553081 25.59167
## nloptwrap.NLOPT_LN_BOBYQA 0.06555124 25.59180
ss$llik-min(ss$llik)
## bobyqa Nelder_Mead
## 3.059904e-08 3.013236e-08
## nlminbwrap optimx.L-BFGS-B
## 3.059472e-08 2.889999e-08
## nloptwrap.NLOPT_LN_NELDERMEAD nloptwrap.NLOPT_LN_BOBYQA
## 0.000000e+00 2.986872e-08
allFit
can be run in parallelPCA of RE variance-covariance matrix:
rePCA(m1)
## $Subject
## Standard deviations (1, .., p=2):
## [1] 0.9668680 0.2308798
##
## Rotation (n x k) = (2 x 2):
## [,1] [,2]
## [1,] -0.99986158 -0.01663769
## [2,] -0.01663769 0.99986158
##
## attr(,"class")
## [1] "prcomplist"
glmmTMB
)lme4
will try to prevent this …)m1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
Similar to generalized linear models: fitted vs. residual, scale-location, Q-Q
aa <- augment(m1)
## fitted vs resid
plot(m1,type=c("p","smooth"))
## scale-location
plot(m1,sqrt(abs(resid(.)))~fitted(.),type=c("p","smooth"))
## q-q
lattice::qqmath(m1,id=0.05)
Or with augment
and ggplot …
aa <- augment(m1)
## fitted vs resid
ggplot(aa,aes(.fitted,.resid))+geom_point()+geom_smooth()
## scale-location
ggplot(aa,aes(.fitted,sqrt(abs(.resid))))+geom_point()+geom_smooth()
## q-q
ggplot(aa)+stat_qq(aes(sample=.resid))
tidy
: get coefficient values
effects="fixed"
effects="ran_pars"
effects="ran_vals"
effects="ran_coefs"
glance
: get model summariesaugment
: get fitted, residuals, possibly predictionsmost important:
more important for variance parameters than fixed-effect parameters
e.g. Normal vs. \(t\)
simulate()
it fro the fitted model:summary()
car::Anova
, afex::anova
emmeans
, effects
packagescar::Anova(m1)
R/calcDenDF.R
lmerTest
/afex
; Satterthwaite, Kenward-Rogerlibrary(lmerTest)
summary(as_lmerModLmerTest(m1))
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.10 24.741
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.825 17.000 36.838 < 2e-16 ***
## Days 10.467 1.546 17.000 6.771 3.26e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
summary(as_lmerModLmerTest(m1),ddf="Kenward-Roger")
## Linear mixed model fit by REML. t-tests use Kenward-Roger's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.10 24.741
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.825 17.000 36.838 < 2e-16 ***
## Days 10.467 1.546 17.000 6.771 3.26e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
Q: Why not use lmerTest
all the time?
A: it can make it a little harder to diagnose fitting problems
profile()
, confint()
anova()
to comparedrop1
, afex::anova()
confint(m1)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 14.3814732 37.7159917
## .sig02 -0.4815008 0.6849863
## .sig03 3.8011643 8.7533851
## .sigma 22.8982669 28.8579965
## (Intercept) 237.6806955 265.1295147
## Days 7.3586533 13.5759188
What took so long?
Very slow!
pbkrtest
confint(.,method="boot")
anova()
In general it’s not as easy as one might like to get confidence intervals on predictions. If we ignore the effects of uncertainty in the random effects, then it can be done using the formula \(\sigma^2 = \textrm{Diag}\left(X V X^\top\right)\)
mm <- model.matrix(terms(fitted_model),newdat)
se_pred <- sqrt(diag(mm %*% tcrossprod(vcov(fitted_model),mm)))
data("Oats",package="nlme")
m_oats <- lmer(yield~nitro*Variety+(1|Block),data=Oats)
(e1 <- emmeans(m_oats,~nitro|Variety))
## Variety = Golden Rain:
## nitro emmean SE df lower.CL upper.CL
## 0.3 104.5 7.12 6.61 87.5 122
##
## Variety = Marvellous:
## nitro emmean SE df lower.CL upper.CL
## 0.3 109.8 7.12 6.61 92.7 127
##
## Variety = Victory:
## nitro emmean SE df lower.CL upper.CL
## 0.3 97.6 7.12 6.61 80.6 115
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
plot(e1)
e2 <- emmeans(m_oats,~Variety)
## NOTE: Results may be misleading due to involvement in interactions
##
contrast(e2,"pairwise")
## contrast estimate SE df t.ratio p.value
## Golden Rain - Marvellous -5.29 4.47 61 -1.184 0.4671
## Golden Rain - Victory 6.88 4.47 61 1.538 0.2804
## Marvellous - Victory 12.17 4.47 61 2.722 0.0226
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: tukey method for comparing a family of 3 estimates
also see the effects
package.
car::Anova(m_oats,test="F")
## Analysis of Deviance Table (Type II Wald F tests with Kenward-Roger df)
##
## Response: yield
## F Df Df.res Pr(>F)
## nitro 81.5155 1 61 7.648e-13 ***
## Variety 3.7268 2 61 0.02972 *
## nitro:Variety 0.3512 2 61 0.70524
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.
Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.
Bolker, Benjamin M. 2015. “Linear and Generalized Linear Mixed Models.” In Ecological Statistics: Contemporary Theory and Application, edited by Gordon A. Fox, Simoneta Negrete-Yankelevich, and Vinicio J. Sosa. Oxford University Press.
Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.