Mixed model lab #1

Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.

Preliminaries

Load packages:

## modeling packages
library("lme4")     ## basic (G)LMMs
library("nlme")     ## more LMMs
library("glmmTMB")
library("afex")     ## helper functions
library("emmeans")

## graphics
library("ggplot2"); theme_set(theme_bw())
## squash panels together
zero_margin <- theme(panel.spacing=grid::unit(0,"lines")) 
library("lattice")
library("dotwhisker")  ## coefficient plots

## data manipulation
library("broom.mixed")

Linear mixed model: starling example

Data from Toby Marthews, r-sig-mixed-models mailing list: you can find it here.

subject: individual bird
roostsitu (tree, nest-box, inside, other): location of nest box
mnth (Nov, Jan): month
stmass: mass of bird

Graphics: spaghetti plots

## cosmetic tweaks
load("data/starling.RData")   ## loads "dataf"
ggplot(dataf,aes(x=mnth,y=stmass))+
  geom_point()+
  geom_line(aes(group=subject))+  ## connect subjects with a line
  facet_grid(.~roostsitu)+        ## 1 row of panels by roost
  zero_margin                         ## squash together

You could also try

ggplot(dataf,aes(mnth,stmass,colour=roostsitu))+
    geom_point()+
    geom_line(aes(group=subject))

Overkill for this data set, but sometimes it can be useful to put every individual in its own facet:

## reorder individuals by magnitude of difference
dataf <- transform(dataf, subject=reorder(subject,stmass,FUN=diff))
ggplot(dataf,aes(mnth,stmass,colour=roostsitu,group=subject))+
    geom_point()+geom_line()+facet_wrap(~subject)+
    zero_margin

It’s pretty obvious that the starting (November) mass varies among roost situations (tree/nest-box/etc.), and that mass increases from November to January, but we might like to know if the slopes differ among situations. That means our fixed effects would be ~roostsitu*mnth, with our attention focused on the roostsitu:mnth (interaction) term. For random effects, we can allow both intercept (obviously) and slope (maybe) to vary among individuals, via (1+mnth|subject) or equivalent … in this case, because measurements are only taken in two months, we can also write the random term as (1|subject/mnth).

However, it turns out that we can’t actually estimate random slopes for this model, because every individual is only measured twice. That means that the variance in the slopes would be completely confounded with the residual variance.

If we fit this with lme4:

lmer1 <- lmer(stmass~mnth*roostsitu+(1|subject/mnth),data=dataf)

## Error: number of levels of each grouping factor must be < number of observations (problems: mnth:subject)

We get an error. We can actually do this (lmerControl(check.nobs.vs.nlev="ignore"); see ?lmerControl for details), and it will actually work, but it will pick an arbitrary division between the residual variance and the month:subject variance.

In principle we can do this model with glmmTMB, by dropping the residual variance term (for technical reasons, this isn’t possible in lme4):

glmmTMB1 <- glmmTMB(stmass~mnth*roostsitu+(1|subject/mnth),
                    data = dataf,
                    ## REML = TRUE,  ## figure out why this gives a npd Hessian ...
                    dispformula = ~0)

But now let’s forget about including the (unidentifiable) random slopes.

lmer2 <- lmer(stmass~mnth*roostsitu+(1|subject),data=dataf)

We can now get estimates, although the subject-level random effects are very uncertain: see confint(lmer2)

Walking through the output:

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: stmass ~ mnth * roostsitu + (1 | subject)
##    Data: dataf

This gives us the formula, reminds us that the model was fitted by restricted maximum likelihood, and gives us the “REML criterion” (equivalent to -2 log likelihood for a model fitted by maximum likelihood).

## REML criterion at convergence: 429.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max

This is a quick reminder of the scaled residuals; these should be approximately unskewed (median $\approx 0$) and the extremes should be somewhere around $\pm 2$ (bigger in a big data set )

## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  subject  (Intercept)  0.3444  0.5869

This tells us about the random effect - it tells us the variances and standard deviations of the random effect (the standard deviations are not an estimate of the uncertainty of the estimate – the estimate itself is a variance, or standard deviation), and the variance and standard deviation of the residual variance. We can see that the standard deviation of the intercept among subjects is much less than the standard deviation of the residual error …

##  Residual             17.3500  4.1653

The listed number of observations and groups is very useful for double-checking that the random effects grouping specification is OK. Are the numbers of groups what we expected from the experimental design?

## 
## Fixed effects:
##                           Estimate Std. Error     df t value Pr(>|t|)    
## (Intercept)                 83.600      1.330 71.973  62.847  < 2e-16 ***
## mnthJan                      7.200      1.863 36.000   3.865 0.000446 ***
## roostsitunest-box           -4.200      1.881 71.973  -2.233 0.028685 *  
## roostsituinside             -5.000      1.881 71.973  -2.658 0.009682 ** 
## roostsituother              -8.200      1.881 71.973  -4.359 4.27e-05 ***
## mnthJan:roostsitunest-box    3.600      2.634 36.000   1.367 0.180244

the standard fixed-effect output. lmer does not tell us the denominator degrees of freedom for the test (although we can get a rough idea of importance/significance fro the $t$ statistics; e.g. $t>2.5$ will be significant at $p<0.05$ for 6 or more degrees of freedom). We’ll come back to this in the inference section.

Diagnostic plots

performance::check_model() gives a relatively complete set of model diagnostics.

performance::check_model(lmer2)

linearity is automatically satisfied when all predictors are categorical and all interactions are included
similarly, in a balanced factorial design all observations have the same leverage
I usually ignore the collinearity diagnostics
residuals are slightly light-tailed, not a cause for concern
random effect estimates (BLUPs/conditional modes) look Normal

DHARMa (diagnostics via simulated residuals: usually unnecessary for LMMs):

sr2 <- DHARMa::simulateResiduals(lmer2)
plot(sr2)

lme4 also has some built-in plotting capabilities (skipped during live-coding session)

Basic diagnostic plots: fitted vs. residuals, coloured according to the roostsitu variable (col=dataf$roostsitu), point type according to month (pch=).

plot(lmer2,col=dataf$roostsitu,pch=dataf$mnth,id=0.05)

The id argument specifies to mark points that are beyond the specified Normal confidence limits (i.e. $\alpha$-values) (these are not outliers in any formal statistical sense but might be points you want to look at if they are otherwise interesting). By default they are labeled by random-effect group; use idLabels to change the default labels, and idLabels=~.obs to label by observation number.

You can get a scale-location plot by specifying

## FIXME: why doesn't this match performance scale-resid plot?
ss <- residuals(lmer2)/(1-hatvalues(lmer2))
xyplot(sqrt(abs(ss)) ~ fitted(lmer2),type=c("p","smooth"))

Adding the smoothed line is helpful because uneven sampling can influence your perception of the pattern.

Q-Q plot (a straight line indicates normality)

qqmath(lmer2,col=dataf$roostsitu,pch=dataf$mnth)

We can also do this with broom.mixed::augment() and stat_qq() (although there are also some some limitations - colour argument doesn’t actually work…)

ggplot(augment(lmer2),
       aes(sample=.resid/sd(.resid)))+  ## scale to variance=1
    stat_qq(aes(group=1,colour=roostsitu))+
    geom_abline(intercept=0,slope=1)

## Warning: The following aesthetics were dropped during statistical
## transformation: colour.
## ℹ This can happen when ggplot fails to infer the correct grouping
##   structure in the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a
##   numerical variable into a factor?

There are some deviations here, but not enough that I would worry very much. In particular, the distribution is slightly thin-tailed (the smallest residuals are largest than expected, and the largest residuals are smaller than expected), which would make the results slightly conservative (a fat-tailed distribution would make them anticonservative).

Boxplots of residuals subdivided by roostsitu (you have to put the grouping variable on the left side of the formula here):

plot(lmer2,roostsitu~resid(.))

Might be easier with augment:

aa <- augment(lmer2)
ggplot(aa,aes(roostsitu,.resid))+
    geom_boxplot()+coord_flip()

Plot random effects to look for outliers:

dotplot(ranef(lmer2))

## $subject

Or with tidy + ggplot:

tt <- tidy(lmer2,effects="ran_vals")
tt <- transform(tt,level=reorder(level,estimate))
ggplot(tt,aes(level,estimate))+
    geom_pointrange(aes(ymin=estimate-1.96*std.error,
                        ymax=estimate+1.96*std.error))+
    coord_flip()

In general, try plotting residuals and $\sqrt{\textrm{abs}(r)}$ both as a function of the fitted values and as a function of particularly important predictors.

if you have your own example data set handy that you can write a simple model for, run these diagnostics and interpret them
the hatvalues() function can also be useful (although all hat values are identical here)
also see the influence.ME package. Try plot(influence(lmer2,group="subject")) and interpret the results …

Try it yourself: tundra carbon example

Download the file tundra_agg.rda (from the data directory given above). The GS.NEE records the net ecosystem exchange (in grams of carbon/m^2/year) in a given year and site; cYear is a centered version of the year; n records the number of observations taken in a given site/year (the NEE value given is an average of these observations).
Draw some plots (with ggplot2 if you can, otherwise with base R graphics) to inspect the data
We are interested in the change in NEE over time (i.e., a random-slopes model). Run an lmer fit (note this data set does not have the issue of the starling data set), draw some preliminary conclusions, and inspect the plot diagnostics.

Inference

Never start doing inference until you’re finished with diagnostics/satisfied with model fit

The coefficient plot is a good first look: dotwhisker::dwplot() does this. Here we use by_2sd=TRUE to scale coefficients by 2x the SD of the predictor variable …

dotwhisker::dwplot(lmer2, by_2sd=TRUE, effects = "fixed")+geom_vline(xintercept=0,lty=2)

(Note that the intercept is automatically dropped; this is usually the right choice. Use show_intercept=TRUE to have the intercept included.) For more complicated coefficient plots, you can use broom.mixed::tidy() to get the parameters in a useful form and dwplot, or ggplot2 with geom_pointrange(), to draw the plot.

Stop and explain to yourself what these parameters mean. If you’re not sure, try resetting the base level of the roostsitu factor: dataf2 <- transform(dataf,roostsitu=relevel(roostsitu,ref="other")), predict what will happen to the results, and re-run the analysis.)

Exercise: This is actually a slightly trivial example, because there are only two measurements for each individual. Thus we can actually analyze the slopes and get the same answers by using a paired analysis, i.e. by computing the mass difference in subjects and then analyzing with a single-level linear model. (In the simplest case this would reduce to a paired $t$-test, but in this case it is a 1-way ANOVA on roostsitu.

Rearrange the data to get differences by subject:

dataf2 <- aggregate(stmass~roostsitu+subject,
               data=dataf,
               FUN=function(x) x[2]-x[1])

This says to aggregate the data frame dataf grouped by roostsitu and subject; for each group, compute the difference of the masses.

Draw a picture (boxplot+beeswarm plot):

ggplot(dataf2,aes(x=roostsitu,y=stmass))+geom_boxplot()+
   geom_dotplot(binaxis="y",stackdir="center",fill="red",alpha=0.5,
                binwidth=0.5)

As you can see, geom_dotplot() adds a horizontal dot-plot for each group (see the documentation [?geom_dotplot] for more details).

Analyze the data with lm and convince yourself that the estimates (fixed-effect coefficients, $t$ statistics, etc.) are equivalent to those found from the previous analysis.
It is also possible to rearrange and summarize the data to test the difference in intercepts, or to estimate the among-individual variance in intercepts (how?)

Inference part 2

What about $p$-values or more sophisticated confidence intervals? The default confidence intervals shown by dwplot() are just Normal confidence intervals (not taking into account any of the effects that make these approximate for mixed models).

We can get profile confidence intervals via confint(lmer2) or tidy(lmer2, conf.method = "profile"); in this example the profile confidence intervals are almost identical to the Wald intervals (dwplot()’s default).

(Insert long discussion of degrees of freedom and $p$-values here.)

We can use lmerTest or afex::mixed (which both wrap functions from the pbkrtest package) to get “denominator degrees of freedom”/$p$-values.

lmerTest wraps the lmer function to compute some additional information, and modifies the summary() function:

library("lmerTest")
## or: as(lmer2, "lmerModLmerTest")
lmer2R <- lmer(stmass~mnth*roostsitu+(1|subject),data=dataf)

Capturing just the coefficient table from summary(lmer2R):

## Fixed effects:
##                           Estimate Std. Error     df t value Pr(>|t|)    
## (Intercept)                 83.600      1.330 71.973  62.847  < 2e-16 ***
## mnthJan                      7.200      1.863 36.000   3.865 0.000446 ***
## roostsitunest-box           -4.200      1.881 71.973  -2.233 0.028685 *  
## roostsituinside             -5.000      1.881 71.973  -2.658 0.009682 ** 
## roostsituother              -8.200      1.881 71.973  -4.359 4.27e-05 ***
## mnthJan:roostsitunest-box    3.600      2.634 36.000   1.367 0.180244    
## mnthJan:roostsituinside      2.400      2.634 36.000   0.911 0.368341    
## mnthJan:roostsituother       1.600      2.634 36.000   0.607 0.547430

By default lmerTest uses the Satterthwaite approximation to the degrees of freedom. You can also use summary(lmer2R,ddf="Kenward-Roger") to get a slightly more accurate (and slower) estimate, but in this case the answers are basically identical.

For simple cases we can use (a slightly improved version of) the level-counting algorithm from lme.

source("R/calcDenDF.R")
calcDenDF(~mnth*roostsitu,data=dataf,random=~1|subject)

##               (Intercept)                   mnthJan         roostsitunest-box 
##                        36                        36                        36 
##           roostsituinside            roostsituother mnthJan:roostsitunest-box 
##                        36                        36                        36 
##   mnthJan:roostsituinside    mnthJan:roostsituother 
##                        36                        36

We conclude that the interactions are not doing much, but there’s definitely an effect of the roosts and months.

However, we probably want to test the overall effect of the interactions, not the individual levels. Here are the type II (sequential) ANOVA results:

anova(lmer2)

## Type III Analysis of Variance Table with Satterthwaite's method
##                 Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
## mnth           1656.20 1656.20     1    36 95.4582 1.152e-11 ***
## roostsitu       552.46  184.15     3    36 10.6141 3.833e-05 ***
## mnth:roostsitu   34.20   11.40     3    36  0.6571    0.5838    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we want to evaluate the marginal sums of squares, i.e. dropping one term at a time from the model, when there are interactions, we usually need to change the model to use sum-to-zero contrasts:

lmer2B <- update(lmer2,            
          contrasts=list(mnth="contr.sum",roostsitu="contr.sum"))

The alternative approach is to use options(contrasts=c("contr.sum","contr.poly")), then refit the model, but I prefer to use the contrasts argument because it is more explicit.

Use afex::mixed to do “type 3” tests of all effects in a model (using Kenward-Roger to get df by default).

afex::mixed(stmass~mnth*roostsitu+(1|subject),data=dataf)

## Mixed Model Anova Table (Type 3 tests, S-method)
## 
## Model: stmass ~ mnth * roostsitu + (1 | subject)
## Data: dataf
##           Effect       df         F p.value
## 1           mnth 1, 36.00 95.46 ***   <.001
## 2      roostsitu 3, 36.00 10.61 ***   <.001
## 3 mnth:roostsitu 3, 36.00      0.66    .584
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Use anova() to do a likelihood ratio test on individual pairs of models.
Use drop1() to drop terms from the model one at a time

be careful when doing “marginal”/“type 3” tests in the presence of interactions!

In this case the results ($F$ values) are identical because the original design is balanced (hence, orthogonal). Not true if the data are (1) unbalanced (which is often true of ANOVA [categorical-predictor] designs, and almost always true of regression designs) or (2) GLMM or nonlinear.

The explicit model-comparison approach uses a likelihood ratio test rather than an $F$ test (i.e., it does not correct for the fact that the denominator sums of squares is estimated with error). In this case it hardly matters.

lmer2C <- update(lmer2B,REML=FALSE)
lmer2D <- update(lmer2C,. ~ . - mnth:roostsitu)
anova(lmer2C,lmer2D)

## Data: dataf
## Models:
## lmer2D: stmass ~ mnth + roostsitu + (1 | subject)
## lmer2C: stmass ~ mnth * roostsitu + (1 | subject)
##        npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## lmer2D    7 464.58 481.25 -225.29   450.58                     
## lmer2C   10 468.45 492.27 -224.22   448.45 2.1344  3      0.545

If we now want to use the model-comparison approach on the reduced (no-interaction) model to test the significance of roostsitu, we can use update to drop the roostsitu effect, but we also have to make sure to update the contrasts argument so that it only refers to predictors that remain in the reduced model (otherwise, we get an error).

drop1(lmer2D,test="Chisq") ## ignore warnings

## Single term deletions using Satterthwaite's method:
## 
## Model:
## stmass ~ mnth + roostsitu + (1 | subject)
##           Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
## mnth      1656.2 1656.20     1    80 101.281 7.259e-16 ***
## roostsitu  574.4  191.47     3    80  11.709 1.941e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we want to test the random effect, we would in principle remove the random effect and test with anova, but this is a bit problematic here: lmer can’t fit a model without any random effects.

Let’s try this with lm:

lm1 <- lm(stmass~mnth*roostsitu,data=dataf)
(a1 <- anova(lmer2C,lm1))

## Data: dataf
## Models:
## lm1: stmass ~ mnth * roostsitu
## lmer2C: stmass ~ mnth * roostsitu + (1 | subject)
##        npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## lm1       9 466.46 487.90 -224.23   448.46                     
## lmer2C   10 468.45 492.27 -224.22   448.45 0.0152  1      0.902

We can also deduce the number of degrees of freedom from standard rules:

with(dataf,table(mnth,roostsitu))

##      roostsitu
## mnth  tree nest-box inside other
##   Nov   10       10     10    10
##   Jan   10       10     10    10

If we think about this in terms of the paired $t$-test, there are 40 comparisons and 4 parameters (one for each roost site)= 36 df.

If you wanted to compute the $p$-values by hand, you could:

## here we use lme4:::anova.merMod to force R to use the anova
## method from lme4 rather than the one from lmerTest
a2 <- lme4:::anova.merMod(lmer2B)
fvals <- a2[,"F value"]
numdf <- a2[,"npar"]
dendf <- 36
pf(fvals,numdf,dendf,lower.tail=FALSE)

## [1] 1.151629e-11 3.833216e-05 5.838288e-01

Alternatively, we can try a parametric bootstrap: the pbkrtest package can do this, or we can just set it up by hand:

## 'log Lik.' 10.01564 (df=10)

## [1] 0.567

library(pbkrtest)
t1 <- system.time(PBmodcomp(lmer3,lmer4,nsim=500))

(takes 7 seconds)

In this case, the Kenward-Roger correction appropriately does nothing different – we have a classical balanced design and no correction is actually necessary. But it does give a denominator df and a $p$-value for this lmer model, which is handy …

Exercise: repeat for the tundra data

Predictions and plotting

In lmer, predict has a re.form argument that specifies which random effects should be included (NA or ~0=none, population level; NULL (=all) or ~subject=prediction at the subject level; more complex models, might have additional nested levels).

dataf$pred <- predict(lmer2,re.form=NA)  ## population level
dataf$pred1 <- predict(lmer2) ## individual level
g0 <- ggplot(dataf,aes(mnth,stmass))+
    geom_point()+
    geom_line(aes(group=subject))+
    facet_grid(.~roostsitu)+
    zero_margin
g0 +   geom_line(colour="gray",aes(y=pred1,group=subject)) +
    geom_line(colour="red",aes(y=pred,group=subject))

There is so much shrinkage (the among-individual variance is very small) that we can barely see the individual-level predictions (gray lines) behind the population-level predictions (red lines).

VarCorr(lmer2)

##  Groups   Name        Std.Dev.
##  subject  (Intercept) 0.58689 
##  Residual             4.16533

Unfortunately computing confidence intervals for the predictions is a little tricky: again, there is some code on the GLMM FAQ for this.

ggplot(dataf,aes(mnth,pred1))+
    geom_line(aes(group=subject,x=as.numeric(mnth)),colour="gray")+
    facet_wrap(~roostsitu,scale="free_y",nrow=1)+
    geom_line(aes(y=pred,x=as.numeric(mnth)),colour="red")

For most cases you will want to set up a new data frame to do prediction rather than just using the covariates from the original data (e.g. if the data are sampled irregularly, or sparsely), and use the newdata argument of predict. The expand.grid function is handy in this context too.

History

originally developed for NCEAS summer institute, July 2013
updated, Alaska ASA workshop, August 2014
updated, Ottawa workshop, Jan 2016 (removed JAGS material)
updated, Morelia workshop (moved MCMCglmm to standalone Bayes)