overview

obligatory Tolstoy quote

  • “all happy families are alike …”
  • … but there are certainly categories of problems

categories of warnings

  • messages vs warnings vs errors
  • convergence (numerical optimization questionable)
  • singular fits (model is degenerate)
    • other forms of degeneracy (complete separation etc.)
  • model diagnostics issues (model is inappropriate)

messages vs warnings vs errors

  • message = informational
  • warning = potentially serious problem. Don’t ignore unless you understand it, and if so, then eliminate the warning as far upstream as possible (e.g. by changing options/settings; suppressWarnings() if that’s the only choice)
  • error = so serious that R/package author can’t or won’t give you an answer (may be possible to override/work around …)

load packages

library(tidyverse)
library(broom.mixed)
library(lme4)
library(blme)
library(glmmTMB)
library(Matrix)

convergence problems

non-positive-definite Hessians

  • what does this mean?
  • modern mixed-model software does some form of non-linear optimization
    • sometimes just the covariance parameters, sometimes fixed + covariance parameters
    • (random-effects parameters are profiled out or estimated in an internal loop)
  • at the optimum we should have a gradient of zero and a positive definite Hessian

Hessian

\[ \left( \begin{array}{cccc} \frac{\partial^2 L}{\partial \beta_{1}^2} & \frac{\partial^2 L}{\partial \beta_{1} \partial \beta_{2}} & \frac{\partial^2 L}{\partial \beta_{1} \partial \beta_{3}} & \ldots \\ \frac{\partial^2 L}{\partial \beta_{2} \partial \beta_{1}} & \frac{\partial^2 L}{\partial \beta_{2}^2} & \frac{\partial^2 L}{\partial \beta_{2} \partial \beta_{3}} & \ldots \\ \frac{\partial^2 L}{\partial \beta_{3} \partial \beta_{1}} & \frac{\partial^2 L}{\partial \beta_{3} \partial \beta_{2}} & \frac{\partial^2 L}{\partial \beta_{3}^2} & \ldots \\ \vdots & \vdots & \vdots & \ddots \end{array} \right) \]

  • eigenvalues of the Hessian give curvature in the principal directions (== eigenvectors)

solutions

  • double-check the model definition and data
    • one random effect per observation (when the conditional distribution has a scale parameter)
    • all observations in a cluster have the same response value (e.g. here)
    • etc.
  • scale and center predictor variables (improves stability)
  • general advice in Bolker et al. 20131 may be useful
  • extract Hessian and see which eigenvector(s) is/are associated with the minimal eigenvalue(s)
    • extracting Hessian often requires some digging, e.g. model@optinfo$derivs (lme4) or model$sdr$cov.fixed (glmmTMB))

more examples

  • many GLMM parameters have a bounded domain
    • variances/standard deviations/dispersion parameters (0 to \(\infty\))
    • probabilities (0 to 1)
  • the easiest way to handle bounded optimization is to fit a transformed, unconstrained version of the parameter
    • log (variance, SD, ratios thereof, odds ratios)
    • logit/log-odds (probabilities)
  • problems when the estimate is actually on the boundary of the original domain (transformed parameter \(\to \pm \infty\))
  • different packages handle this in different ways

example: negative binomial dispersion

  • NBinom can only handle overdispersion, not equi- or underdispersion
  • example:
set.seed(102)
dd <- data.frame(x = rpois(1000, lambda = 2))
m1 <- MASS::glm.nb(x ~ 1, data = dd)
environment(m1$family$variance)$.Theta
  • solutions
    • ignore it
    • revert to Poisson model
    • use a quasi-likelihood model (fit Poisson, then adjust SEs etc.)
    • use a distribution that allows underdispersion: (glmmTMB) genpois, compois, or an ordinal model

example: non-zero-inflated models

  • zero-inflation models fit a finite mixture of “structural” and “sampling” zeros
  • “many zeros does not mean zero inflation”2
  • log-odds(0) \(\to -\infty\)
  • similar to NB case

lme4 convergence warnings

a specific (problematic!) case

  • extra check on top of optimizer-based convergence checks
  • many false positives

diagnosing convergence warnings

  • allFit
  • are parameters close enough?
  • what is the distribution of negative log-likelihoods?
  • which parameters are unstable?
library(lme4)
source("R/conv_ex.R")
model1 <- lmer(eval~1 + group*(emint_n + grade_n) + (1 + grade_n+emint_n|class), data=dd)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00229598 (tol = 0.002, component 1)

Scaling (autoscaling is on the list for lme4) …

double_cols <- sapply(dd, is.double)
dd_sc <- dd
dd_sc[double_cols] <- sapply(dd[double_cols], \(x) drop(scale(x)))
model2 <- update(model1, data =  dd_sc)
library(broom.mixed) ## for tidy()
library(ggplot2)
source("R/allFit_utils.R")
## Loading required package: colorspace
aa <- allFit(model1)
## bobyqa : [OK]
## Nelder_Mead : [OK]
## nlminbwrap :
## boundary (singular) fit: see help('isSingular')
## [OK]
## nmkbw :
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 2 negative eigenvalues
## [OK]
## optimx.L-BFGS-B : [OK]
## nloptwrap.NLOPT_LN_NELDERMEAD : [OK]
## nloptwrap.NLOPT_LN_BOBYQA :
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00229598 (tol = 0.002, component 1)
## [OK]
tt <- tidy(aa)

Check log-likelihoods: do most or all optimizers agree?

glance_allfit_NLL(aa)
## # A tibble: 7 × 2
##   optimizer                          NLL_rel
##   <chr>                                <dbl>
## 1 bobyqa                        0           
## 2 nloptwrap.NLOPT_LN_NELDERMEAD 0.0000000207
## 3 Nelder_Mead                   0.0000000284
## 4 optimx.L-BFGS-B               0.000000209 
## 5 nloptwrap.NLOPT_LN_BOBYQA     0.000000591 
## 6 nlminbwrap                    1.14        
## 7 nmkbw                         2.50

Run convergence check by hand:

checkConv(model1@optinfo$derivs, model1@optinfo$val, lmerControl()$checkConv, model1@lower)
## Warning in checkConv(model1@optinfo$derivs, model1@optinfo$val,
## lmerControl()$checkConv, : Model failed to converge with max|grad| = 0.00229598
## (tol = 0.002, component 1)
## $code
## [1] -1
## 
## $messages
## [1] "Model failed to converge with max|grad| = 0.00229598 (tol = 0.002, component 1)"

(in this case it is the scaled gradient that fails the test).

Check

plot_allfit(aa) + theme_set(theme_bw())

Check random effect components (may or may not be of interest):

plot_allfit(aa, keep_effects = "ran_pars")
## Warning: Removed 28 rows containing missing values or values outside the scale range
## (`geom_segment()`).
## Warning: Removed 21 rows containing missing values or values outside the scale range
## (`geom_segment()`).

singular fits

  • best-fit value of a variance can be 0
  • e.g. expected among-group variance in a 1-way ANOVA is \(\sigma^2_b + (\sigma^2_w)/n\). If we substitute sample values for these quantities we can get a method-of-moments estimate <0 …
  • 3,4
  • lme4 is a little more graceful than glmmTMB; fits
simfun <- function() {
    dd <- expand.grid(f = factor(1:3), rep = 1:10)
    dd$y <- suppressMessages(simulate(~ 1 + (1|f),
                     newdata = dd,
                     newparams = list(beta = 0, theta = 0.1, sigma = 1),
                     family = gaussian))[[1]]
    m <- lmer(y ~ 1 + (1 | f), data = dd,
              control = lmerControl(calc.derivs=FALSE, check.conv.singular = "ignore"))
    getME(m, "theta")
}
simfun()
## f.(Intercept) 
##             0
set.seed(101)
thvec <- replicate(500, simfun())
hist(thvec, breaks = 50)

library(lme4)
data(Orthodont,package="nlme")
Orthodont$nsex <- as.numeric(Orthodont$Sex=="Male")
Orthodont$nsexage <- with(Orthodont, nsex*age)
m1 <- lmer(distance ~ age + (age|Subject) + (0+nsex|Subject),
           start = c(1.8,-0.1,0.14, 0.5),
           ##(0 + nsexage|Subject),
           data=Orthodont)
## boundary (singular) fit: see help('isSingular')
VarCorr(m1)
##  Groups    Name        Std.Dev.   Corr  
##  Subject   (Intercept) 2.3270e+00       
##            age         2.2643e-01 -0.609
##  Subject.1 nsex        6.9976e-05       
##  Residual              1.3100e+00
rePCA(m1)
## $Subject
## Standard deviations (1, .., p=3):
## [1] 1.7794421974 0.1368062328 0.0000534153
## 
## Rotation (n x k) = (3 x 3):
##             [,1]       [,2] [,3]
## [1,] -0.99822608 0.05953739    0
## [2,]  0.05953739 0.99822608    0
## [3,]  0.00000000 0.00000000    1
## 
## attr(,"class")
## [1] "prcomplist"
library(nlme)
## 
## Attaching package: 'nlme'
## The following object is masked _by_ '.GlobalEnv':
## 
##     Orthodont
## The following object is masked from 'package:lme4':
## 
##     lmList
## The following object is masked from 'package:dplyr':
## 
##     collapse
m2 <- lme(distance ~ age, random = ~age|Subject,
    data = Orthodont,
    weights = varIdent(form = ~1|Sex))
library(glmmTMB)
m3 <- glmmTMB(distance ~ age + (age|Subject),
              dispformula = ~Sex,
              REML = TRUE,
              data = Orthodont)
exp(fixef(m3)$disp)
## (Intercept)   SexFemale 
##    1.645261    0.404098
logLik(m3)
## 'log Lik.' -210.8233 (df=7)

model simplification

  • drop terms, e.g. correlation between slopes and intercepts5
  • (should probably center continuous covariates first)
  • Matuschek et al6 say you should be dropping terms anyway
  • drop correlations (e.g. (f||g) or diag(f|g)
  • reduced-rank models (glmmTMB does factor analytic models, e.g. rr(f1*f2|g, d = 5); or gllvm package)
  • buildmer package (??)

model simplication7

include_graphics("pix/uriarte.png")

regularization

  • blme: assign priors, do maximum a posteriori estimation
    • default for RE terms: SD \(\sim\) Gamma(shape = 2.5, rate = 0)8
  • priors in glmmTMB (new)
  • priors in Bayesian pkgs (of course) MCMCglmm, brms, rstanarm

complete separation (GLM(M)-specific)

  • Bernoulli data with 0/1 separation, count data with 0/positive separation
  • Firth regression; add priors

example (from glmmTMB vignette)

load("data/culcita.RData")
cdatx <- culcita_dat[-20,]
form <- predation~ttt+(1|block)
cmod_glmer <- glmer(form, data=cdatx, family=binomial)
cmod_glmmTMB <- glmmTMB(form, data=cdatx, family=binomial)
cmod_bglmer <- bglmer(form, data=cdatx, family=binomial,
                    fixef.prior = normal(cov = diag(9,4)))
cprior <- data.frame(prior = rep("normal(0,3)", 2),
                     class = rep("beta", 2),
                     coef = c("(Intercept)", ""))
cmod_glmmTMB_p <- update(cmod_glmmTMB, priors = cprior)
tt <- (tibble::lst(cmod_glmer, cmod_glmmTMB, cmod_bglmer, cmod_glmmTMB_p)
    |> purrr::map_dfr(~ tidy(., effects = "fixed"), .id = "model")
)
ggplot(tt, aes(x = estimate, y = term, colour = model)) +
       geom_pointrange(aes(xmin = estimate - 2*std.error,
                           xmax = estimate + 2*std.error),
                       position = position_dodge(width = 0.5))

1.
Bolker, B. M. et al. Strategies for fitting nonlinear ecological models in R, AD Model Builder, and BUGS. Methods in Ecology and Evolution 4, 501–512 (2013).
2.
3.
Pryseley, A., Tchonlafi, C., Verbeke, G. & Molenberghs, G. Estimating negative variance components from Gaussian and non-Gaussian data: A mixed models approach. Computational Statistics & Data Analysis 55, 1071–1085 (2011).
4.
5.
Barr, D. J., Levy, R., Scheepers, C. & Tily, H. J. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68, 255–278 (2013).
6.
Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H. & Bates, D. Balancing type I error and power in linear mixed models. Journal of Memory and Language 94, 305–315 (2017).
7.
Uriarte, M. & Yackulic, C. B. Preaching to the unconverted. Ecological Applications 19, 592–596 (2009).
8.
Chung, Y., Rabe-Hesketh, S., Dorie, V., Gelman, A. & Liu, J. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika 1–25 (2013) doi:10.1007/s11336-013-9328-2.