Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.
library(lme4)
library(glmmTMB)
library(aods3)
library(glmmTMB)
library(broom.mixed)
library(dotwhisker)
Barr et al. (2013); Bates et al. (2015); Matuschek et al. (2017)
Starting again with the “basic” CBPP model:
data("cbpp",package="lme4")
g1 <- glmer(incidence/size~period+(1|herd),
data=cbpp,
weights=size,
family=binomial)
What happens if we try to fit the maximal model?
g1max <- update(g1,.~.-(1|herd) + (period|herd))
## Error: number of observations (=56) < number of random effects (=60) for term (period | herd); the random-effects parameters are probably unidentifiable
Use glmerControl()
to override the warning …
g1max <- glmer(incidence/size~period+(period|herd),
data=cbpp,
weights=size,
family=binomial,
control=glmerControl(check.nobs.vs.nRE="warning"))
## Warning: number of observations (=56) < number of random effects (=60) for term
## (period | herd); the random-effects parameters are probably unidentifiable
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.014498 (tol = 0.002, component 1)
VarCorr(g1max)
## Groups Name Std.Dev. Corr
## herd (Intercept) 0.96471
## period2 1.39959 -0.894
## period3 1.97383 -0.504 0.837
## period4 0.90544 -0.811 0.464 -0.096
eigen(VarCorr(g1max)[[1]])$values
## [1] 5.906126e+00 1.699171e+00 2.518033e-05 2.936276e-06
Look at ?convergence
…
af <- allFit(g1max)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]
(warnings/messages suppressed)
We can use summary(af)
to compare all of these fits.
(period|herd)
vs. (1|period/herd)
([positive] compound symmetry)\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cccc} \sigma^2_{\{h|1\}} & . & . & . \\ \sigma_{\{h|1\},\{h|p_{21}\}} & \sigma^2_{\{h|p_{21}\}} & . & . \\ \sigma_{\{h|1\}, \{h|p_{31}\}} & \sigma_{\{h|p_{21}\},\{h|p_{31}\}} & \sigma^2_{\{h|p_{31}\}} & . \\ \sigma_{\{h|1\} ,\{h|p_{41}\}} & \sigma_{\{h|p_{21}\},\{h|p_{41}\}} & \sigma_{\{h|p_{31}\},\{h|p_{41}\}} & \sigma^2_{\{h|p_{41}\}} \end{array} \right] \right) \] (=\((n(n+1))/2 = (4\times 5)/2 = 10\) parameters) vs. \[ \left[ \begin{array}{cccc} \sigma^2 & . & . & . \\ \rho \sigma^2 & \sigma^2 & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 \\ \end{array} \right] \] where \(\sigma^2 = \sigma^2_{\{b|1\}}+\sigma^2_{\{herd:period|1\}}\), \(\rho = \sigma^2_{\{b|1\}}/\sigma^2\) (=2 parameters; \(\rho\) must be >0)
g1cs <- update(g1max,
. ~ . - (period|herd) + (1|herd/period))
The latter model is called a compound symmetry model, i.e. the variances are the same and the covariances/correlations between all pairs are the same. This is a slightly restricted version of compound symmetry, because (the way we have set it up) only non-negative correlations are possible. In general, this is a good way to simplify variation of factor effects across groups when there are many levels of the factor, and when it is plausible to treat the factor levels as exchangeable. The simplified (CS) model works fine in this example - but is equivalent (in this case, where there is only one observation per herd per period) to observation-level random effects!
load("../../data/gopherdat2.RData")
Gdat$obs <- factor(seq(nrow(Gdat)))
Our desired maximal model would be something like
glmer(shells~prev+offset(log(Area))+(1|year)+(1|Site)+(1|obs),
family=poisson,data=Gdat)
## boundary (singular) fit: see help('isSingular')
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: shells ~ prev + offset(log(Area)) + (1 | year) + (1 | Site) +
## (1 | obs)
## Data: Gdat
## AIC BIC logLik deviance df.resid
## 90.0359 97.0419 -40.0179 80.0359 25
## Random effects:
## Groups Name Std.Dev.
## obs (Intercept) 0.000e+00
## Site (Intercept) 1.089e-05
## year (Intercept) 1.163e-01
## Number of obs: 30, groups: obs, 30; Site, 10; year, 3
## Fixed Effects:
## (Intercept) prev
## -3.52835 0.02107
## optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 1 lme4 warnings
or
glmmTMB(shells~prev+offset(log(Area))+(1|year)+(1|Site),
family=nbinom2,data=Gdat)
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting')
## Formula: shells ~ prev + offset(log(Area)) + (1 | year) + (1 | Site)
## Data: Gdat
## AIC BIC logLik df.resid
## 90.03588 97.04186 -40.01794 25
## Random-effects (co)variances:
##
## Conditional model:
## Groups Name Std.Dev.
## year (Intercept) 1.164e-01
## Site (Intercept) 9.103e-09
##
## Number of obs: 30 / Conditional model: year, 3; Site, 10
##
## Dispersion parameter for nbinom2 family (): 3.92e+07
##
## Fixed Effects:
##
## Conditional model:
## (Intercept) prev
## -3.52838 0.02107
Problems …
ended up with this …
gmod_lme4_L <- glmer(shells~prev+offset(log(Area))+factor(year)+(1|Site),
family=poisson,data=Gdat,
control=glmerControl(optimizer="bobyqa",
check.conv.grad=.makeCC("warning",0.05)))
## boundary (singular) fit: see help('isSingular')
(see ecostats example)
Try full model (except use only a simple fixed effect for the top-level region, reg
, which has only 3 levels):
load("../../data/Banta.RData")
t0 <- system.time(
mp1 <- glmer(total.fruits ~ nutrient*amd +
reg + rack + status +
(amd*nutrient|popu)+
(amd*nutrient|gen),
data=dat.tf,
family="poisson")
)
## boundary (singular) fit: see help('isSingular')
Inspect for singularity/overdispersion:
eigen(VarCorr(mp1)$gen)$values
## [1] 2.16111878 0.45199361 0.12339611 0.02321676
eigen(VarCorr(mp1)$popu)$values
## [1] 1.666746e+00 1.066821e-01 7.014628e-08 1.629630e-09
deviance(mp1)/df.residual(mp1) ## !!
## [1] 23.33935
aods3::gof(mp1)
## D = 13910.25, df = 596, P(>D) = 0
## X2 = 15079.55, df = 596, P(>X2) = 0
Add observation-level random effect (switch optimizer to BOBYQA)
dat.tf$obs <- 1:nrow(dat.tf)
t1 <- system.time(
mp1X <- update(mp1,
. ~. + (1|obs),
control=glmerControl(optimizer="bobyqa"))
)
## boundary (singular) fit: see help('isSingular')
This model has 28 parameters and takes 12 seconds … Is glmmTMB
faster?
t2 <- system.time(
mp1g <- glmmTMB(total.fruits ~ nutrient*amd +
reg + rack + status +
(amd*nutrient|popu)+
(amd*nutrient|gen)+
(1|obs),
data=dat.tf,
family="poisson")
)
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting')
Better (5), although glmmTMB
doesn’t handle singular fits quite as gracefully …
mp1X2 <- update(mp1X,
. ~ . - (amd*nutrient|popu)
- (amd*nutrient|gen)
+ (amd+nutrient|popu)
+ (amd+nutrient|gen))
## boundary (singular) fit: see help('isSingular')
Ugh, looks even worse …
VarCorr(mp1X2)
## Groups Name Std.Dev. Corr
## obs (Intercept) 1.419975
## gen (Intercept) 0.482349
## amdclipped 0.036182 -1.000
## nutrient8 0.312491 -1.000 1.000
## popu (Intercept) 0.434531
## amdclipped 0.075765 1.000
## nutrient8 0.138279 1.000 1.000
Use lmer_alt
to split factor variable into separate terms …
mp1X3 <- afex::lmer_alt(total.fruits ~ nutrient*amd +
reg + rack + status +
(amd*nutrient||popu)+
(amd*nutrient||gen)+
(1|obs),
data=dat.tf,
family="poisson")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.129393 (tol = 0.002, component 1)
Strip down further:
mp1X4 <- update(mp1X2,
. ~ . - (amd+nutrient|popu)
- (amd+nutrient|gen)
+ (nutrient|popu)
+ (1|gen))
## boundary (singular) fit: see help('isSingular')
Still correlation=+1, reduce further:
mp1X5 <- afex::lmer_alt(total.fruits ~ nutrient*amd +
reg + rack + status +
(1|gen)+
(1+nutrient||popu)+
(1|obs),
data=dat.tf,
family="poisson")
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0262992 (tol = 0.002, component 1)
dwplot(list(mp1,mp1X3,mp1X4,mp1X5),effect="fixed")+
geom_vline(xintercept=0,lty=2)
Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.
Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.
Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.