# Introduction

This is an informal FAQ list for the r-sig-mixed-models mailing list.

The most commonly used functions for mixed modeling in R are

• linear mixed models: aov(), nlme::lme1, lme4::lmer; brms::brm
• generalized linear mixed models (GLMMs)
• frequentist: MASS::glmmPQL, lme4::glmer; glmmTMB
• Bayesian: MCMCglmm::MCMCglmm; brms::brm
• nonlinear mixed models: nlme::nlme, lme4::nlmer; brms::brm
• GNLMMs: brms::brm

Another quick-and-dirty way to search for mixed-model related packages on CRAN:

grep("l.?m[me][^t]",rownames(available.packages()),value=TRUE)
##  [1] "blmeco"         "buildmer"       "cellVolumeDist" "climextRemes"
##  [5] "curtailment"    "elementR"       "glmertree"      "glmmEP"
##  [9] "glmmfields"     "glmmLasso"      "glmmML"         "glmmSeq"
## [13] "glmmsr"         "glmmTMB"        "lamme"          "lme4"
## [17] "lmec"           "lmeInfo"        "lmem.qtler"     "lmeNB"
## [21] "lmeNBBayes"     "lmeresampler"   "lmerTest"       "lmeSplines"
## [25] "lmmot"          "lmmpar"         "lrmest"         "lsmeans"
## [29] "mailmerge"      "mlmm.gwas"      "mlmmm"          "mvglmmRank"
## [33] "nlmeODE"        "nlmeU"          "palmerpenguins" "tlmec"
## [37] "vagalumeR"

There are some false positives here (e.g. palmerpenguins); see here if you’re interested in “regex golf”.

## Other sources of help

• the mailing list is r-sig-mixed-models@r-project.org
• archives here
• or Google search with the tag site:https://stat.ethz.ch/pipermail/r-sig-mixed-models/
• The source code of this document is available on GitHub; the rendered (HTML) version lives on GitHub pages.
• Searching on StackOverflow with the [r] [mixed-models] tags, or on CrossValidated with the [mixed-model] tag may be helpful (these sites also have an [lme4] tag).

DISCLAIMERS:

• (G)LMMs are hard - harder than you may think based on what you may have learned in your second statistics class, which probably focused on picking the appropriate sums of squares terms and degrees of freedom for the numerator and denominator of an $$F$$ test. ‘Modern’ mixed model approaches, although more powerful (they can handle more complex designs, lack of balance, crossed random factors, some kinds of non-Normally distributed responses, etc.), also require a new set of conceptual tools. In order to use these tools you should have at least a general acquaintance with classical mixed-model experimental designs but you should also, probably, read something about modern mixed model approaches. Littell et al. (2006) and Pinheiro and Bates (2000) are two places to start, although Pinheiro and Bates is probably more useful if you want to use R. Other useful references include Gelman and Hill (2006) (focused on Bayesian methods) and Zuur et al. (2009b). If you are going to use generalized linear mixed models, you should understand generalized linear models (Dobson and Barnett (2008), Faraway (2006), and McCullagh and Nelder (1989) are standard references; the last is the canonical reference, but also the most challenging).
• All of the issues that arise with regular linear or generalized-linear modeling (e.g.: inadequacy of p-values alone for thorough statistical analysis; need to understand how models are parameterized; need to understand the principle of marginality and how interactions can be treated; dangers of overfitting, which are not mitigated by stepwise procedures; the non-existence of free lunches) also apply, and can apply more severely, to mixed models.
• When SAS (or Stata, or Genstat/AS-REML or …) and R differ in their answers, R may not be wrong. Both SAS and R may be right’ but proceeding in a different way/answering different questions/using a different philosophical approach (or both may be wrong …)
• The advice in this FAQ comes with absolutely no warranty of any sort.

# References

## linear mixed models

### web/open

• pinheiro_mixed-effects_2000: LMM only.
• Zuur et al. (2009b): Focused on ecology.
• Gelman and Hill (2006): LMM and GLMM; Bayesian; examples from social science. Intermediate mathematics.
• (Rethinking)

# Model definition

## Model specification

The following formula extensions for specifying random-effects structures in R are used by

• lme4
• nlme (nested effects only, although crossed effects can be specified with more work)
• glmmADMB and glmmTMB

MCMCglmm uses a different specification, inherited from AS-REML.

(Modified from Robin Jeffries, UCLA:)

formula meaning
(1|group) random group intercept
(x|group) = (1+x|group) random slope of x within group with correlated intercept
(0+x|group) = (-1+x|group) random slope of x within group: no variation in intercept
(1|group) + (0+x|group) uncorrelated random intercept and random slope within group
(1|site/block) = (1|site)+(1|site:block) intercept varying among sites and among blocks within sites (nested random effects)
site+(1|site:block) fixed effect of sites plus random variation in intercept among blocks within sites
(x|site/block) = (x|site)+(x|site:block) = (1 + x|site)+(1+x|site:block) slope and intercept varying among sites and among blocks within sites
(x1|site)+(x2|block) two different effects, varying at different levels
x*site+(x|site:block) fixed effect variation of slope and intercept varying among sites and random variation of slope and intercept among blocks within sites
(1|group1)+(1|group2) intercept varying among crossed random effects (e.g. site, year)

Or in a little more detail:

equation formula
$$β_0 + β_{1}X_{i} + e_{si}$$ n/a (Not a mixed-effects model)
$$(β_0 + b_{S,0s}) + β_{1}X_i + e_{si}$$ ∼ X + (1∣Subject)
$$(β_0 + b_{S,0s}) + (β_{1} + b_{S,1s}) X_i + e_{si}$$ ~ X + (1 + X∣Subject)
$$(β_0 + b_{S,0s} + b_{I,0i}) + (β_{1} + b_{S,1s}) X_i + e_{si}$$ ∼ X + (1 + X∣Subject) + (1∣Item)
As above, but $$S_{0s}$$, $$S_{1s}$$ independent ∼ X + (1∣Subject) + (0 + X∣ Subject) + (1∣Item)
$$(β_0 + b_{S,0s} + b_{I,0i}) + β_{1}X_i + e_{si}$$ ∼ X + (1∣Subject) + (1∣Item)
$$(β_0 + b_{I,0i}) + (β_{1} + b_{S,1s})X_i + e_{si}$$ ∼ X + (0 + X∣Subject) + (1∣Item)

Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius)

The magic development version of the equatiomatic package can handle mixed models (remotes::install_github("datalorax/equatiomatic")), e.g.

library(lme4)
library(equatiomatic)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
equatiomatic::extract_eq(fm1)

\begin{aligned} \operatorname{Reaction}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{Days}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\mu_{\alpha_{j}} \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for Subject j = 1,} \dots \text{,J} \end{aligned}

It doesn’t handle GLMMs (yet), but you could fit two fake models — one LMM like your GLMM but with a Gaussian response, and one GLM with the same family/link function as your GLMM but without the random effects — and put the pieces together.

## Should I treat factor xxx as fixed or random?

This is in general a far more difficult question than it seems on the surface. There are many competing philosophies and definitions. For example, from Gelman (2005):

Before discussing the technical issues, we briefly review what is meant by fixed and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics.

Another useful comment (via Kevin Wright) reinforcing the idea that “random vs. fixed” is not a simple, cut-and-dried decision: from Schabenberger and Pierce (2001), p. 627:

Before proceeding further with random field linear models we need to remind the reader of the adage that one modeler’s random effect is another modeler’s fixed effect.

Clark and Linzer (2015) address this question from a mostly econometric perspective, focusing mostly on practical variance/bias/RMSE criteria.

One point of particular relevance to ‘modern’ mixed model estimation (rather than ‘classical’ method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) – more than 5 or 6 at a minimum. This is not surprising if you consider that random effects estimation is trying to estimate an among-block variance. For example, from Crawley (2002) p. 670:

Are there enough levels of the factor in the data on which to base an estimate of the variance of the population of effects? No, means [you should probably treat the variable as] fixed effects.

Some researchers (who treat fixed vs random as a philosophical rather than a pragmatic decision) object to this approach.

Also see a very thoughtful chapter in Hodges (2016).

Treating factors with small numbers of levels as random will in the best case lead to very small and/or imprecise estimates of random effects; in the worst case it will lead to various numerical difficulties such as lack of convergence, zero variance estimates, etc.. (A small simulation exercise shows that at least the estimates of the standard deviation are downwardly biased in this case; it’s not clear whether/how this bias would affect the point estimates of fixed effects or their estimated confidence intervals.) In the classical method-of-moments approach these problems may not arise (because the sums of squares are always well defined as long as there are at least two units), but the underlying problems of lack of power are there nevertheless.

Thierry Onkelinx has a blog post with some simulations on the impact of the number of levels and concludes with a few recommendations for the number of levels of the grouping variable $$n_s$$: > - get $$n_s > 1000$$ levels when an accurate estimate of the random effect variance is crucial. E.g. when a single number will be use for power calculations. > - get $$n_s > 100$$ levels when a reasonable estimate of the random effect variance is sufficient. E.g. power calculations with sensitivity analysis of the random effect variance. > - get $$n_s > 20$$ levels for an experimental study > - in case $$10 < n_s <20$$ you should validate the model very cautious before using the output > - in case $$n_s < 10$$ it is safer to use the variable as a fixed effect.

Oberpriller, Leite, and Pichler (2021) also performed a simulation study and found that while the estimates are similar for treating a variable with a small number of levels as fixed or random are similar, there was an impact on Type 1 and Type 2 error rates. They also found that the precise random effects structure (e.g., inclusion of random slopes) had a large impact on these properties.

Also see this thread on the r-sig-mixed-models mailing list and this question on CrossValidated.

## Nested or crossed?

• Relatively few mixed effect modeling packages can handle crossed random effects, i.e. those where one level of a random effect can appear in conjunction with more than one level of another effect. (This definition is confusing, and I would happily accept a better one.) A classic example is crossed temporal and spatial effects. If there is random variation among temporal blocks (e.g. years) ‘’and’’ random variation among spatial blocks (e.g. sites), ‘’and’’ if there is a consistent year effect across sites and ‘’vice versa’’, then the random effects should be treated as crossed.
• lme4 does handled crossed effects, efficiently
• if you need to deal with crossed REs in conjunction with some of the features that nlme offers (e.g. heteroscedasticity of residuals via weights/varStruct, correlation of residuals via correlation/corStruct, or if you want to used crossed REs with the gamlss package, see p. 163ff of Pinheiro and Bates (2000) (section 4.2.2: Google books link). I give a worked example here. As far as I can tell, a couple of hacks are necessary to get this to work: (1) the data must be expressed as a groupedData object (at least, I haven’t managed to get it to work in any other way); (2) the crossed effects must be nested within another grouping factor - in the example here I define a dummy group, which is awkward (it makes the variance component for this group and the residual variance jointly unidentifiable), but otherwise seems to work OK.
• I rarely find it useful to think of fixed effects as “nested” (although others disagree); if for example treatments A and B are only measured in block 1, and treatments C and D are only measured in block 2, one still assumes (because they are fixed effects) that each treatment would have the same effect if applied in the other block. (One might like to estimate treatment-by-block interactions, but in this case the experimental design doesn’t allow it; one would have to have multiple treatments measured within each block, although not necessarily all treatments in every block.) One would code this analysis as response~treatment+(1|block) in lme4. Also, in the case of fixed effects, crossed and nested specifications change the parameterization of the model, but not anything else (e.g. the number of parameters estimated, log-likelihood, model predictions are all identical). That is, in R’s model.matrix function (which implements a version of Wilkinson-Rogers notation) a*b and a/b (which expand to 1+a+b+a:b and 1+a+a:b respectively) give model matrices with the same number of columns.
• Whether you explicitly specify a random effect as nested or not depends (in part) on the way the levels of the random effects are coded. If the ‘lower-level’ random effect is coded with unique levels, then the two syntaxes (1|a/b) (or (1|a)+(1|a:b)) and (1|a)+(1|b) are equivalent. If the lower-level random effect has the same labels within each larger group (e.g. blocks 1, 2, 3, 4 within sites A, B, and C) then the explicit nesting (1|a/b) is required. It seems to be considered best practice to code the nested level uniquely (e.g. A1, A2, …, B1, B2, …) so that confusion between nested and crossed effects is less likely.

# Model extensions

## Overdispersion

### Testing for overdispersion/computing overdispersion factor

• with the usual caveats, plus a few extras – counting degrees of freedom, etc. – the usual procedure of calculating the sum of squared Pearson residuals and comparing it to the residual degrees of freedom should give at least a crude idea of overdispersion. The following attempt counts each variance or covariance parameter as one model degree of freedom and presents the sum of squared Pearson residuals, the ratio of (SSQ residuals/rdf), the residual df, and the $$p$$-value based on the (approximately!!) appropriate $$\chi^2$$ distribution. Do PLEASE note the usual, and extra, caveats noted here: this is an APPROXIMATE estimate of an overdispersion parameter. Even in the GLM case, the expected deviance per point equaling 1 is only true as the distribution of individual deviates approaches normality, i.e. the usual $$\lambda>5$$ rules of thumb for Poisson values and $$\textrm{min}(Np, N(1-p)) > 5$$ for binomial values (e.g. see Venables and Ripley (2002), p. 208-209). (And that’s without the extra complexities due to GLMM, i.e. the “effective” residual df should be large enough to make the sums of squares converge on a $$\chi^2$$ distribution …)
• overdispersion is not estimable (and hence practically irrelevant) for Bernoulli models (= binary data = binomial with $$N=1$$).
• overdispersion does occur, but is much less central, for models that estimate a scale parameter (i.e. almost anything but Poisson or binomial: Gamma, negative binomial, …). Whereas overdispersion for one-parameter distributions like the Poisson or binomial suggests the need to move to an analogous overdispersed distribution (e.g. negative binomial or beta-binomial respectively), overdispersion for more complex models is a more general indication of model misspecification.
• The recipes below may need adjustment for some of the more complex model types allowed by glmmTMB (e.g. zero-inflation/variable dispersion), where it’s less clear what to measure to estimate overdispersion.

The following function should work for a variety of model types (at least glmmADMB, glmmTMB, lme4, …).

overdisp_fun <- function(model) {
rdf <- df.residual(model)
rp <- residuals(model,type="pearson")
Pearson.chisq <- sum(rp^2)
prat <- Pearson.chisq/rdf
pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}

Example:

library(lme4)
library(glmmTMB)
set.seed(101)
d <- data.frame(x=runif(1000),
f=factor(sample(1:10,size=1000,replace=TRUE)))

## REML for GLMMs

• While restricted maximum likelihood (REML) procedures (Wikipedia are well established for linear mixed models, it is less clear how one should define and compute the equivalent criteria (integrating out the effects of fixed parameters) for GLMMs. Millar (2011) and Berger, Liseo, and Wolpert (1999) are possible starting points in the peer-reviewed literature, and there are mailing-list discussions of these issues here and here.
• Attempting to use REML=TRUE with glmer will produce the warning extra argument(s) ‘REML’ disregarded
• glmmTMB allows REML=TRUE for GLMMs (it uses the Laplace approximation to integrate over the fixed effect parameters), since version 0.2.2

# Inference and confidence intervals

## Testing hypotheses

### What are the p-values listed by summary(glmerfit) etc.? Are they reliable?

By default, in keeping with the tradition in analysis of generalized linear models, lme4 and similar packages display the Wald Z-statistics for each parameter in the model summary. These have one big advantage: they’re convenient to compute. However, they are asymptotic approximations, assuming both that (1) the sampling distributions of the parameters are multivariate normal (or equivalently that the log-likelihood surface is quadratic) and that (2) the sampling distribution of the log-likelihood is (proportional to) $$\chi^2$$. The second approximation is discussed further under “Degrees of freedom”. The first assumption usually requires an even greater leap of faith, and is known to cause problems in some contexts (for binomial models failures of this assumption are called the Hauck-Donner effect), especially with extreme-valued parameters.

### Methods for testing single parameters

From worst to best:

• Wald $$Z$$-tests
• For balanced, nested LMMs where degrees of freedom can be computed according to classical rules: Wald $$t$$-tests
• Likelihood ratio test, either by setting up the model so that the parameter can be isolated/dropped (via anova or drop1, or via computing likelihood profiles
• Markov chain Monte Carlo (MCMC) or parametric bootstrap confidence intervals

### Tests of effects (i.e. testing that several parameters are simultaneously zero)

From worst to best:

• Wald chi-square tests (e.g. car::Anova)
• Likelihood ratio test (via anova or drop1)
• For balanced, nested LMMs where df can be computed: conditional F-tests
• For LMMs: conditional F-tests with df correction (e.g. Kenward-Roger in pbkrtest package: see notes on K-R etc below.
• MCMC or parametric, or nonparametric, bootstrap comparisons (nonparametric bootstrapping must be implemented carefully to account for grouping factors: see e.g. the REB or ‘case bootstrap’ methods in the lmeresampler package)

### Is the likelihood ratio test reliable for mixed models?

• It depends.
• Not for fixed effects in finite-size cases (see Pinheiro and Bates (2000)): may depend on ‘denominator degrees of freedom’ (number of groups) and/or total number of samples - total number of parameters
• Conditional F-tests are preferred for LMMs, if denominator degrees of freedom are known

### Why doesn’t lme4 display denominator degrees of freedom/p values? What other options do I have?

There is an R FAQ entry on this topic, which links to a mailing list post by Doug Bates (there is also a voluminous mailing list thread reproduced on the R wiki). The bottom line is

• For special cases that correspond to classical experimental designs (i.e. balanced designs that are nested, split-plot, randomized block, etc.) … we can show that the null distributions of particular ratios of sums of squares follow an $$F$$ distribution with known numerator and denominator degrees of freedom (and hence the sampling distributions of particular contrasts are t-distributed with known df). In more complicated situations (unbalanced, GLMMs, crossed random effects, models with temporal or spatial correlation, etc.) it is not in general clear that the null distribution of the computed ratio of sums of squares is really an F distribution, for any choice of denominator degrees of freedom.
• For each simple degrees-of-freedom recipe that has been suggested (trace of the hat matrix, etc.) there seems to be at least one fairly simple counterexample where the recipe fails badly (e.g. see this r-help thread from September 2006).
• When the responses are normally distributed and the design is balanced, nested etc. (i.e. the classical LMM situation), the scaled deviances and differences in deviances are exactly $$F$$-distributed and looking at the experimental design (i.e., which treatments vary/are replicated at which levels) tells us what the relevant degrees of freedom are (see “df alternatives” below)
• Two approaches to approximating df (Satterthwaite and Kenward-Roger) have been implemented in R, Satterthwaite in lmerTest and Kenward-Roger in pbkrtest (as KRmodcomp) (various packages such as lmerTest, emmeans, car, etc., import pbkrtest::get_Lb_ddf).
• K-R is probably the most reliable option (Schaalje, McBride, and Fellingham 2002), although it may be prohibitively computationally expensive for large data sets.

• K-R was derived for LMMs (and for REML?) in particular, it isn’t clear how it would apply to GLMMs. Stroup (2014) states (referencing Stroup (2013)) that K-R actually works reasonably well for GLMMs (K-R is not implemented in R for GLMMs; Stroup suggests that a pseudo-likelihood (Wolfinger and O’Connell 1993) approach is necessary in order to implement K-R for GLMMs):

Notice the non-integer values of the denominator df. They, and the $$F$$ and $$p$$ values, reflect the procedure developed by Kenward and Roger (2009) to account for the effect of the covariance structure on degrees of freedom and standard errors. Although the Kenward–Roger adjustment was derived for the LMM with normally distributed data and is an ad hoc procedure for GLMMs with non-normal data, informal simulation studies consistently have suggested that the adjustment is accurate. The Kenward-Roger adjustment requires that the SAS GLIMMIX default computing algorithm, pseudo-likelihood, be used rather than the Laplace algorithm used to obtain AICC statistics. Stroup (2013b) found that for binomial and Poisson GLMMs, pseudo-likelihood with the Kenward–Roger adjustment yields better Type I error control than Laplace while preserving the GLMM’s advantage with respect to power and accuracy in estimating treatment means.

• There are several different issues at play in finite-size (small-sample) adjustments, which apply slightly differently to LMMs and GLMMs.
• When the data don’t fit into the classical framework (crossed, unbalanced, R-side effects), we might still guess that the deviances etc. are approximately F-distributed but that we don’t know the real degrees of freedom – this is what the Satterthwaite, Kenward-Roger, Fai-Cornelius, etc. approximations are supposed to do.
• When the responses are not normally distributed (as in GLMs and GLMMs), and when the scale parameter is not estimated (as in standard Poisson- and binomial-response models), then the deviance differences are only asymptotically F- or chi-square-distributed (i.e. not for our real, finite-size samples). In standard GLM practice, we usually ignore this problem; there is some literature on finite-size corrections for GLMs under the rubrics of “Bartlett corrections” and “higher order asymptotics” (see McCullagh and Nelder (1989), Cordeiro, Paula, and Botter (1994), Cordeiro and Ferrari (1998) and the cond package (on CRAN) [which works with GLMs, not GLMMs]), but it’s rarely used. (The bias correction/Firth approach implemented in the brglm package attempts to address the problem of finite-size bias, not finite-size non-chi-squaredness of the deviance differences.)
• When the scale parameter in a GLM is estimated rather than fixed (as in Gamma or quasi-likelihood models), it is sometimes recommended to use an $$F$$ test to account for the uncertainty of the scale parameter (e.g. Venables and Ripley (2002) recommend anova(...,test="F") for quasi-likelihood models)
• Combining these issues, one has to look pretty hard for information on small-sample or finite-size corrections for GLMMs: Feng, Braun, and McCulloch (2004) and Bell and Grunwald (2010) look like good starting points, but it’s not at all trivial.

#### Df alternatives:

• use MASS::glmmPQL (uses old nlme rules approximately equivalent to SAS ‘inner-outer’/‘within-between’ rules) for GLMMs, or (n)lme for LMMs
• Guess the denominator df from standard rules (for standard designs, e.g. see Gotelli and Ellison (2004)) and apply them to $$t$$ or $$F$$ tests
• Run the model in lme (if possible) and use the denominator df reported there (which follow a simple ‘inner-outer’ rule which should correspond to the canonical answer for simple/orthogonal designs), applied to $$t$$ or $$F$$ tests. For the explicit specification of the rules that lme uses, see page 91 of Pinheiro and Bates (this page was previously available on Google Books, but the link is no longer useful, so here are the relevant paragraphs):

These conditional tests for fixed-effects terms require denominator degrees of freedom. In the case of the conditional $$F$$-tests, the numerator degrees of freedom are also required, being determined by the term itself. The denominator degrees of freedom are determined by the grouping level at which the term is estimated. A term is called inner relative to a factor if its value can change within a given level of the grouping factor. A term is outer to a grouping factor if its value does not changes within levels of the grouping factor. A term is said to be estimated at level $$i$$, if it is inner to the $$i-1$$st grouping factor and outer to the $$i$$th grouping factor. For example, the term Machine in the fm2Machine model is outer to Machine %in% Worker and inner to Worker, so it is estimated at level 2 (Machine %in% Worker). If a term is inner to all $$Q$$ grouping factors in a model, it is estimated at the level of the within-group errors, which we denote as the $$Q+1$$st level.

The intercept, which is the parameter corresponding to the column of all 1’s in the model matrices $$X_i$$, is treated differently from all the other parameters, when it is present. As a parameter it is regarded as being estimated at level 0 because it is outer to all the grouping factors. However, its denominator degrees of freedom are calculated as if it were estimated at level $$Q+1$$. This is because the intercept is the one parameter that pools information from all the observations at a level even when the corresponding column in $$X_i$$ doesn’t change with the level.

Letting $$m_i$$ denote the total number of groups in level $$i$$ (with the convention that $$m_0=1$$ when the fixed effects model includes an intercept and 0 otherwise, and $$m_{Q+1}=N$$) and $$p_i$$ denote the sum of the degrees of freedom corresponding to the terms estimated at level $$i$$, the $$i$$th level denominator degrees of freedom is defined as

$\mathrm{denDF}_i = m_i - (m_{i-1} + p_i), i = 1, \dots, Q$

This definition coincides with the classical decomposition of degrees of freedom in balanced, multilevel ANOVA designs and gives a reasonable approximation for more general mixed-effects models.

Note that the implementation used in lme gets the wrong answer for random-slopes models:

library(nlme)
lmeDF <- function(formula=distance~age,random=~1|Subject) {
mod <- lme(formula,random,data=Orthodont)
aa <- anova(mod)
return(setNames(aa[,"denDF"],rownames(aa)))
}
lmeDF()
## (Intercept)         age
##          80          80
lmeDF(random=~age|Subject) ## wrong!
## (Intercept)         age
##          80          80

I (BB) have re-implemented this algorithm in a way that does slightly better for random-slopes models (but may still get confused!), see here.

source("R/calcDenDF.R")
calcDenDF(~age,"Subject",nlme::Orthodont)
## (Intercept)         age
##          80          80
calcDenDF(~age,data=nlme::Orthodont,random=~1|Subject)
## (Intercept)         age
##          80          80
calcDenDF(~age,data=nlme::Orthodont,random=~age|Subject) ## off by 1
## (Intercept)         age
##          81          25
• use SAS, Genstat (AS-REML), Stata?
• Assume infinite denominator df (i.e. $$Z$$/$$\chi^2$$ test rather than $$t$$/$$F$$) if number of groups is large (>45? Various rules of thumb for how large is “approximately infinite” have been posed, including (in Angrist and Pischke 2009), 42 (in homage to Douglas Adams)

### Testing significance of random effects

• the most common way to do this is to use a likelihood ratio test, i.e. fit the full and reduced models (the reduced model is the model with the focal variance(s) set to zero). For example:
library(lme4)
m2 <- lmer(Reaction~Days+(1|Subject)+(0+Days|Subject),sleepstudy,REML=FALSE)
m1 <- update(m2,.~Days+(1|Subject))
m0 <- lm(Reaction~Days,sleepstudy)
anova(m2,m1,m0) ## two sequential tests
## Data: sleepstudy
## Models:
## m0: Reaction ~ Days
## m1: Reaction ~ Days + (1 | Subject)
## m2: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
##    npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)
## m0    3 1906.3 1915.9 -950.15   1900.3
## m1    4 1802.1 1814.8 -897.04   1794.1 106.214  1  < 2.2e-16 ***
## m2    5 1762.0 1778.0 -876.00   1752.0  42.075  1  8.782e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With recent versions of lme4, goodness-of-fit (deviance) can be compared between (g)lmer and (g)lm models, although anova() must be called with the mixed ((g)lmer) model listed first. Keep in mind that LRT-based null hypothesis tests are conservative when the null value (such as $$\sigma^2=0$$) is on the boundary of the feasible space (Self and Liang 1987; Stram and Lee 1994; Goldman and Whelan 2000); in the simplest case (single random effect variance), the p-value is approximately twice as large as it should be (Pinheiro and Bates 2000).

• Consider not testing the significance of random effects. If the random effect is part of the experimental design, this procedure may be considered ‘sacrificial pseudoreplication’ (Hurlbert 1984). Using stepwise approaches to eliminate non-significant terms in order to squeeze more significance out of the remaining terms is dangerous in any case.
• consider using the RLRsim package, which has a fast implementation of simulation-based tests of null hypotheses about zero variances, for simple tests. (However, it only applies to lmer models, and is a bit tricky to use for more complex models.)
library(RLRsim)
## compare m0 and m1
exactLRT(m1,m0)
##
##  simulated finite sample distribution of LRT. (p-value based on 10000
##  simulated values)
##
## data:
## LRT = 106.21, p-value < 2.2e-16
## compare m1 and m2
mA <- update(m2,REML=TRUE)
m0B <- update(mA, . ~ . - (0 + Days|Subject))
m.slope  <- update(mA, . ~ . - (1|Subject))
exactRLRT(m0=m0B,m=m.slope,mA=mA)
##
##  simulated finite sample distribution of RLRT.
##
##  (p-value based on 10000 simulated values)
##
## data:
## RLRT = 42.796, p-value < 2.2e-16
• Parametric bootstrap: fit the reduced model, then repeatedly simulate from it and compute the differences between the deviance of the reduced and the full model for each simulated data set. Compare this null distribution to the observed deviance difference. This procedure is implemented in the pbkrtest package (messages and warnings suppressed).
(pb <- pbkrtest::PBmodcomp(m2,m1,seed=101))
## Bootstrap test; time: 7.43 sec; samples: 996; extremes: 0;
## Requested samples: 996 Used samples: 514 Extremes: 0
## large : Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
## Reaction ~ Days + (1 | Subject)
##          stat df   p.value
## LRT    42.075  1 8.782e-11 ***
## PBtest 42.075     0.001942 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

### Standard errors of variance estimates

• Paraphrasing Doug Bates: the sampling distribution of variance estimates is in general strongly asymmetric: the standard error may be a poor characterization of the uncertainty.
• lme4 allows for computing likelihood profiles of variances and computing confidence intervals on their basis; these likelihood profile confidence intervals are subject to the usual caveats about the LRT with finite sample sizes.
• Using an MCMC-based approach (the simplest/most canned is probably to use the MCMCglmm package, although its mode specifications are not identical to those of lme4) will provide posterior distributions of the variance parameters: quantiles or credible intervals (HPDinterval() in the coda package) will characterize the uncertainty.
• (don’t say we didn’t warn you …) [n]lme fits contain an element called apVar which contains the approximate variance-covariance matrix (derived from the Hessian, the matrix of (numerically approximated) second derivatives of the likelihood (REML?) at the maximum (restricted?) likelihood values): you can derive the standard errors from this list element via sqrt(diag(lme.obj$apVar)). For whatever it’s worth, though, these estimates might not match the estimates that SAS gives which are supposedly derived in the same way. • it’s not a full solution, but there is some more information here. I have some delta-method computations there that are off by a factor of 2 for the residual standard deviation, as well as some computations based on reparameterizing the deviance function. ### P-values: MCMC and parametric bootstrap Abandoning the approximate $$F$$/$$t$$-statistic route, one ends up with the more general problem of estimating $$p$$-values. There is a wider range of options here, although many of them are computationally intensive … #### Markov chain Monte Carlo sampling: • pseudo-Bayesian: post-hoc sampling, typically (1) assuming flat priors and (2) starting from the MLE, possibly using the approximate variance-covariance estimate to choose a candidate distribution • via mcmcsamp (if available for your problem: i.e. LMMs with simple random effects – not GLMMs or complex random effects) • via pvals.fnc in the languageR package, a wrapper for mcmcsamp) • in AD Model Builder, possibly via the glmmADMB package (use the mcmc=TRUE option) or the R2admb package (write your own model definition in AD Model Builder), or outside of R • via the sim function from the arm package (simulates the posterior only for the beta (fixed-effect) coefficients; not yet working with development lme4; would like a better formal description of the algorithm …?) • fully Bayesian approaches • via the MCMCglmm package • glmmBUGS (a WinBUGS wrapper/R interface) • JAGS/WinBUGS/OpenBUGS etc., via the rjags/r2jags/R2WinBUGS/BRugs packages #### Status of mcmcsamp mcmcsamp is a function for lme4 that is supposed to sample from the posterior distribution of the parameters, based on flat/improper priors for the parameters [ed: I believe, but am not sure, that these priors are flat on the scale of the theta (Cholesky-factor) parameters]. At present, in the CRAN version (lme4 0.999999-0) and the R-forge “stable” version (lme4.0 0.999999-1), this covers only linear mixed models with uncorrelated random effects. As has been discussed in a variety of places (e.g. on r-sig-mixed models, and on the r-forge bug tracker, it is challenging to come up with a sampler that accounts properly for the possibility that the posterior distributions for some of the variance components may be mixtures of point masses at zero and continuous distributions. Naive samplers are likely to get stuck at or near zero. Doug Bates has always been a bit unsure that mcmcsamp is really performing as intended, even in the limited cases it now handles. Given this uncertainty about how even the basic version works, the lme4 developers have been reluctant to make the effort to extend it to GLMMs or more complex LMMs, or to implement it for the development version of lme4 … so unless something miraculous happens, it will not be implemented for the new version of lme4. As always, users are encouraged to write and share their own code that implements these capabilities … #### Parametric bootstrap The idea here is that in order to do inference on the effect of (a) predictor(s), you (1) fit the reduced model (without the predictors) to the data; (2) many times, (2a) simulate data from the reduced model; (2b) fit both the reduced and the full model to the simulated (null) data; (2c) compute some statistic(s) [e.g. t-statistic of the focal parameter, or the log-likelihood or deviance difference between the models]; (3) compare the observed values of the statistic from fitting your full model to the data to the null distribution generated in step 2. - PBmodcomp in the pbkrtest package - see the example in help("simulate-mer") in the lme4 package to roll your own, using a combination of simulate() and refit(). - bootMer in lme4 version >1.0.0 - a presentation at UseR! 2009 (abstract, slides) went into detail about a proposed bootMer package and suggested it could work for GLMMs too – but it does not seem to be active. ## Predictions and/or confidence (or prediction) intervals on predictions Note that none of the following approaches takes the uncertainty of the random effects parameters into account … if you want to take RE parameter uncertainty into account, a Bayesian approach is probably the easiest way to do it. The general recipe for computing predictions from a linear or generalized linear model is to • figure out the model matrix $$X$$ corresponding to the new data; • matrix-multiply $$X$$ by the parameter vector $$\beta$$ to get the predictions (or linear predictor in the case of GLM(M)s); • extract the variance-covariance matrix of the parameters $$V$$ • compute $$X V X^{\prime}$$ to get the variance-covariance matrix of the predictions; • extract the diagonal of this matrix to get variances of predictions; • if computing prediction rather than confidence intervals, add the residual variance; • take the square-root of the variances to get the standard deviations (errors) of the predictions; • compute confidence intervals based on a Normal approximation; • for GL(M)Ms, run the confidence interval boundaries (not the standard errors) through the inverse-link function. ### lme library(nlme) fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject, data = Orthodont) plot(Orthodont,asp="fill") ## plot responses by individual ## note that expand.grid() orders factor levels by *order of ## appearance* -- must match levels(Orthodont$Sex)
newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Female","Male"))
newdat$pred <- predict(fm1, newdat, level = 0) ## [-2] drops response from formula Designmat <- model.matrix(formula(fm1)[-2], newdat) predvar <- diag(Designmat %*% vcov(fm1) %*% t(Designmat)) newdat$SE <- sqrt(predvar)
newdat$SE2 <- sqrt(predvar+fm1$sigma^2)

library(ggplot2)
pd <- position_dodge(width=0.4)
g0 <- ggplot(newdat,aes(x=age,y=pred,colour=Sex))+
geom_point(position=pd)
cmult <- 2  ## could use 1.96 instead
g0 + geom_linerange(aes(ymin=pred-cmult*SE,ymax=pred+cmult*SE), position=pd)`