Model simplification examples

keep maximal model?
remove terms until non-singular?
find minimum-AIC model?
find minimum non-singular AIC model?
backwards stepwise regression

Barr et al. (2013); Bates et al. (2015); Matuschek et al. (2017)

CBPP data

Starting again with the “basic” CBPP model:

data("cbpp",package="lme4")
g1 <- glmer(incidence/size~period+(1|herd),
      data=cbpp,
      weights=size,
      family=binomial)

What happens if we try to fit the maximal model?

g1max <- update(g1,.~.-(1|herd) + (period|herd))

## Error: number of observations (=56) < number of random effects (=60) for term (period | herd); the random-effects parameters are probably unidentifiable

Use glmerControl() to override the warning …

g1max <- glmer(incidence/size~period+(period|herd),
      data=cbpp,
      weights=size,
      family=binomial,
      control=glmerControl(check.nobs.vs.nRE="warning"))

## Warning: number of observations (=56) < number of random effects (=60) for term
## (period | herd); the random-effects parameters are probably unidentifiable

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0226853 (tol = 0.002, component 1)

This doesn’t report singularity, but it probably should:

isSingular(g1max)

## [1] FALSE

isSingular(g1max, tol = 1e-3) ## usual tolerance is 1e-4

## [1] TRUE

min(eigen(VarCorr(g1max)[[1]])$values)

## [1] 4.028969e-07

Tried restarting with

update(g1max, start = list(theta = getME(g1max, "theta"), fixef = getME(g1max, "beta")))

but no luck:

Look at ?convergence …

aa <- allFit(g1max)

## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap : [OK]
## nmkbw : [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA : [OK]

(warnings/messages suppressed)

source("R/allFit_utils.R")
glance_allfit_NLL(aa)

## # A tibble: 7 × 2
##   optimizer                      NLL_rel
##   <chr>                            <dbl>
## 1 bobyqa                        0       
## 2 nlminbwrap                    3.16e-10
## 3 optimx.L-BFGS-B               2.97e- 8
## 4 nloptwrap.NLOPT_LN_NELDERMEAD 1.17e- 7
## 5 nmkbw                         1.11e- 5
## 6 nloptwrap.NLOPT_LN_BOBYQA     2.39e- 1
## 7 Nelder_Mead                   5.96e- 1

plot_allfit(aa)

plot_allfit(aa, keep_effects = "ran_pars")

simplification

(period|herd) vs. (1|period/herd) (positive compound symmetry)

\[ (\textrm{intercept}, \textrm{slope}) = \textrm{MVN}\left(\boldsymbol 0, \left[ \begin{array}{cccc} \sigma^2_{\{h|1\}} & . & . & . \\ \sigma_{\{h|1\},\{h|p_{21}\}} & \sigma^2_{\{h|p_{21}\}} & . & . \\ \sigma_{\{h|1\}, \{h|p_{31}\}} & \sigma_{\{h|p_{21}\},\{h|p_{31}\}} & \sigma^2_{\{h|p_{31}\}} & . \\ \sigma_{\{h|1\} ,\{h|p_{41}\}} & \sigma_{\{h|p_{21}\},\{h|p_{41}\}} & \sigma_{\{h|p_{31}\},\{h|p_{41}\}} & \sigma^2_{\{h|p_{41}\}} \end{array} \right] \right) \] (=\((n(n+1))/2 = (4\times 5)/2 = 10\) parameters) vs. \[ \left[ \begin{array}{cccc} \sigma^2 & . & . & . \\ \rho \sigma^2 & \sigma^2 & . & . \\ \rho \sigma^2 & \rho \sigma^2 & \sigma^2 & . \\ \rho \sigma^2 & \rho \sigma^2 & \rho \sigma^2 & \sigma^2 \\ \end{array} \right] \] where \(\sigma^2 = \sigma^2_{\{b|1\}}+\sigma^2_{\{herd:period|1\}}\), \(\rho = \sigma^2_{\{b|1\}}/\sigma^2\) (=2 parameters; \(\rho\) must be >0)

g1cs <- update(g1max,
               . ~ . - (period|herd) + (1|herd/period))

g1int <- update(g1max, . ~ . - (period|herd) + (1|herd))
bbmle::AICtab(g1cs, g1int, g1max)

##       dAIC df
## g1cs   0.0 6 
## g1max  4.9 14
## g1int  7.4 5

The latter model is called a compound symmetry model, i.e. the variances are the same and the covariances/correlations between all pairs are the same. This is a slightly restricted version of compound symmetry, because (the way we have set it up) only non-negative correlations are possible. In general, this is a good way to simplify variation of factor effects across groups when there are many levels of the factor, and when it is plausible to treat the factor levels as exchangeable. The simplified (CS) model works fine in this example - but is equivalent (in this case, where there is only one observation per herd per period) to observation-level random effects …

gopher tortoise data

load("data/gopherdat2.RData")
## for observation-level random effects
Gdat$obs <- factor(seq(nrow(Gdat)))

Our desired maximal model would be something like

g2 <- glmer(shells~prev+offset(log(Area))+(1|year)+(1|Site)+(1|obs),
      family=poisson,data=Gdat)

## boundary (singular) fit: see help('isSingular')

g3 <- glmmTMB(shells~prev+offset(log(Area))+(1|year)+(1|Site),
      family=nbinom2,data=Gdat)

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

Singularity/non-positive definite Hessian problems …

ended up with this …

gmod_lme4_L <- glm(shells~prev+offset(log(Area))+factor(year)+factor(Site),
                   family=poisson, data=Gdat)

What does buildmer do in this case?

gmod_bm <- buildmer(shells~prev+offset(log(Area))+(1|year)+(1|Site)+(1|obs),
                    family=poisson,data=Gdat)

## Determining predictor order

## Fitting via glm: shells ~ 1

## Currently evaluating LRT for: offset(log(Area)), prev

## Fitting via glm: shells ~ 1 + offset(log(Area))

## Fitting via glm: shells ~ 1 + prev

## Updating formula: shells ~ 1 + prev

## Currently evaluating LRT for: offset(log(Area))

## Fitting via glm: shells ~ 1 + prev + offset(log(Area))

## Updating formula: shells ~ 1 + prev + offset(log(Area))

## Fitting via glm: shells ~ 1 + prev + offset(log(Area))

## Currently evaluating LRT for: 1 | year, 1 | Site, 1 | obs

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | year)

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | Site)

## boundary (singular) fit: see help('isSingular')

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | obs)

## boundary (singular) fit: see help('isSingular')

## Updating formula: shells ~ offset(log(Area)) + 1 + prev + (1 | year)

## Currently evaluating LRT for: 1 | Site, 1 | obs

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | year) + (1 | Site)

## boundary (singular) fit: see help('isSingular')

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | year) + (1 | obs)

## boundary (singular) fit: see help('isSingular')

## Ending the ordering procedure due to having reached the maximal
##     feasible model - all higher models failed to converge. The types of
##     convergence failure are: Singular fit

## Fitting ML and REML reference models

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | year)
## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + prev + (1
##     | year)

## Testing terms

## Fitting via glmer, with ML: shells ~ offset(log(Area)) + 1 + (1 | year)

## Fitting via glmer, with ML: shells ~ 1 + prev + (1 | year)

## Fitting via glm: shells ~ 1 + prev + offset(log(Area))

##   grouping              term                   block       score Iteration
## 2     <NA>                 1                 NA NA 1          NA         1
## 3     <NA>              prev              NA NA prev -15.9362932         1
## 1     <NA> offset(log(Area)) NA NA offset(log(Area))   0.0000000         1
## 4     year                 1               NA year 1  -0.9232343         1
##            LRT
## 2           NA
## 3 1.193447e-06
## 1 1.000000e+00
## 4 3.972322e-01

## Updating formula: shells ~ 1 + prev + (1 | year)

## Fitting ML and REML reference models

## Fitting via glmer, with ML: shells ~ 1 + prev + (1 | year)
## Fitting via glmer, with ML: shells ~ 1 + prev + (1 | year)

## Testing terms

## Fitting via glmer, with ML: shells ~ 1 + (1 | year)

## Fitting via glm: shells ~ 1 + prev

##   grouping term      block       score Iteration          LRT
## 2     <NA>    1    NA NA 1          NA         2           NA
## 3     <NA> prev NA NA prev -15.9362932         2 1.051884e-07
## 4     year    1  NA year 1  -0.9232343         2 2.967785e-01

## Updating formula: shells ~ 1 + prev

## Fitting ML reference model

## Fitting via glm: shells ~ 1 + prev

## Testing terms

## Fitting via glm: shells ~ 1

##   grouping term      block     score Iteration          LRT
## 2     <NA>    1    NA NA 1        NA         3           NA
## 3     <NA> prev NA NA prev -15.93629         3 1.199377e-07

## All terms are significant

summary(gmod_bm)

## 
## Call:
## stats::glm(formula = shells ~ 1 + prev, family = poisson, data = Gdat)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.28656    0.25280  -1.134    0.257    
## prev         0.02485    0.00467   5.322 1.03e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 74.545  on 29  degrees of freedom
## Residual deviance: 46.523  on 28  degrees of freedom
## AIC: 104.38
## 
## Number of Fisher Scoring iterations: 5

(see ecostats example)

Arabidopsis data

Try full model (except use only a simple fixed effect for the top-level region, reg, which has only 3 levels):

load("data/Banta.RData")
t0 <- system.time(
    mp1 <- glmer(total.fruits ~ nutrient*amd +
                 reg + rack + status +
                 (amd*nutrient|popu)+
                 (amd*nutrient|gen),
             data=dat.tf,
             family="poisson")
)

## boundary (singular) fit: see help('isSingular')

Inspect for singularity/overdispersion:

eigen(VarCorr(mp1)$gen)$values  ## OK

## [1] 2.16934273 0.45193104 0.12332353 0.02324225

eigen(VarCorr(mp1)$popu)$values ## problematic

## [1] 1.663326e+00 1.072214e-01 8.196406e-07 1.053675e-09

deviance(mp1)/df.residual(mp1) ## very serious overdispersion

## [1] 23.33933

aods3::gof(mp1)  ## packaged version

## D  = 13910.24, df = 596, P(>D) = 0
## X2 = 15079.52, df = 596, P(>X2) = 0

Add observation-level random effect (switch optimizer to BOBYQA)

dat.tf$obs <- 1:nrow(dat.tf)
t1 <- system.time(
    mp1X <- update(mp1,
                   . ~. + (1|obs),
                   control=glmerControl(optimizer="bobyqa"))
)

## boundary (singular) fit: see help('isSingular')

This model has 28 parameters and takes 14 seconds … Is glmmTMB faster?

t2 <- system.time(
    mp1g <- glmmTMB(total.fruits ~ nutrient*amd +
                        reg + rack + status +
                        (amd*nutrient|popu)+
                        (amd*nutrient|gen)+
                        (1|obs),
                    data=dat.tf,
                    family="poisson")
)

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')

## Warning in finalizeTMB(TMBStruc, obj, fit, h, data.tmb.old): Model convergence
## problem; false convergence (8). See vignette('troubleshooting'),
## help('diagnose')

Better (14 seconds), although glmmTMB doesn’t handle singular fits as gracefully …

Replace full model with interactions (*) with additive model (+)

mp1X2 <- update(mp1X,
                . ~ . - (amd*nutrient|popu)
                - (amd*nutrient|gen)
                + (amd+nutrient|popu)
                + (amd+nutrient|gen))

## boundary (singular) fit: see help('isSingular')

Ugh, looks even worse …

VarCorr(mp1X2)

##  Groups Name        Std.Dev. Corr         
##  obs    (Intercept) 1.419975              
##  gen    (Intercept) 0.482349              
##         amdclipped  0.036182 -1.000       
##         nutrient8   0.312491 -1.000  1.000
##  popu   (Intercept) 0.434530              
##         amdclipped  0.075766 1.000        
##         nutrient8   0.138279 1.000  1.000

Use lmer_alt to split factor variable into separate terms …

mp1X3 <- afex::lmer_alt(total.fruits ~ nutrient*amd +
                            reg + rack + status +
                            (amd*nutrient||popu)+
                            (amd*nutrient||gen)+
                            (1|obs),
                        data=dat.tf,
                        family="poisson")

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.139429 (tol = 0.002, component 1)

Strip down further:

mp1X4 <- update(mp1X2,
                . ~ . - (amd+nutrient|popu)
                - (amd+nutrient|gen)
                + (nutrient|popu)
                + (1|gen))

## boundary (singular) fit: see help('isSingular')

Still correlation=+1, reduce further:

mp1X5 <- afex::lmer_alt(total.fruits ~ nutrient*amd +
                            reg + rack + status +
                            (1|gen)+
                            (1+nutrient||popu)+
                            (1|obs),
                        data=dat.tf,
                        family="poisson")

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.0158658 (tol = 0.002, component 1)

What does buildmer select?

L <- load("data/arabidopsis_batch.rda")
buildmer_fit

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: total.fruits ~ 1 + nutrient + rack + status + amd + nutrient:amd +  
##     (1 | reg_popu) + (1 | reg_popu_gen)
##    Data: dat.tf
##       AIC       BIC    logLik  deviance  df.resid 
## 18271.725 18311.665 -9126.862 18253.725       616 
## Random effects:
##  Groups       Name        Std.Dev.
##  reg_popu_gen (Intercept) 0.2479  
##  reg_popu     (Intercept) 0.4977  
## Number of obs: 625, groups:  reg_popu_gen, 24; reg_popu, 9
## Fixed Effects:
##          (Intercept)             nutrient8                 rack2  
##               3.1305                1.0092               -0.7592  
##    statusPetri.Plate      statusTransplant            amdclipped  
##              -0.2400               -0.1904               -0.4555  
## nutrient8:amdclipped  
##               0.4316

Check brute-force AIC results …

bbmle::AICtab(mp_fits)

##                   dAIC df
## int_gen_int_popu   0.0 12
## nut_gen_int_popu   0.1 14
## int_gen_nut_popu   2.4 14
## none_gen_int_popu  2.4 11
## nut_gen_nut_popu   3.1 16
## int_gen_amd_popu   3.4 14
## nut_gen_amd_popu   3.5 16
## amd_gen_int_popu   3.9 14
## none_gen_nut_popu  4.5 13
## int_gen_none_popu  5.1 11
## none_gen_amd_popu  5.6 13
## amd_gen_nut_popu   6.3 16
## amd_gen_amd_popu   7.3 16
## nut_gen_none_popu  8.5 13
## amd_gen_none_popu  9.1 13
## obs               53.2 10

Should probably use Bayes (possibly with regularizing priors …)

References

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv:1506.04967 [Stat], June. http://arxiv.org/abs/1506.04967.

Matuschek, Hannes, Reinhold Kliegl, Shravan Vasishth, Harald Baayen, and Douglas Bates. 2017. “Balancing Type I Error and Power in Linear Mixed Models.” Journal of Memory and Language 94: 305–15. https://doi.org/10.1016/j.jml.2017.01.001.

Mixed model lab #2 (GLMMs)

Ben Bolker

17:20 12 May 2024