Responding to a query from Rolf Turner on r-sig-mixed-models@r-project.org.

tl;dr the error messages are spurious/due to a singular fit. Considerable speed-ups can be achieved by (1) switching optimizers, (2) using nAGQ=0, (3) using glmmTMB or Julia …

to do

caveats

## fitting packages
library(lme4)
library(glmmTMB)
library(brms)
## post-analysis pkgs
library(broom)  ## need BMB version for glmmTMB
library(ggplot2); theme_set(theme_bw())
library(MASS) ## before dplyr; avoid masking dplyr::select()
library(dplyr)
library(tidyr)
library(ggstance)

To run the code in the Julia section below you need to install:

set.seed(42)
X <- artSim()
write.csv(X,file="artSim.csv",row.names=FALSE)
t1 <- system.time(fit1 <- glmer(cbind(Dead,Alive) ~ (Trt + 0)/x + (x | Rep),
                        family=binomial(link="cloglog"),
                        data=X))
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

We should deal with the “failure to converge in 10000 evaluations” warning first: this comes directly from the optimizer (minqa::bobyqa), and hence means we don’t think we’ve even gotten to the optimum yet. We need to use the optCtrl component of the glmerControl() function to adjust the maximum number of iterations.

t2 <- system.time(fit2 <- update(fit1,control=glmerControl(optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

Now we have just the post-fitting convergence warnings. The first thing we should do is to check for singularity: if the fit is singular, i.e. some of the \(\theta\) (variance-covariance) parameters are at their constrained values, then the warning about negative eigenvalues of the Hessian is largely irrelevant (it is a long-term wish-list item to evaluate the Karush-Kuhn-Tucker conditions properly in this case, i.e. allowing for the constraints.

is.singular <- function(x,tol=1e-6) any(abs(getME(x,"theta"))<tol)
is.singular(fit2)
## [1] TRUE

Furthermore, in this case we can more simply look a the variance-covariance matrix and see that it has a zero variance (and an undefined correlation), which more directly indicates singularity.

VarCorr(fit2)
##  Groups Name        Std.Dev. Corr 
##  Rep    (Intercept) 0.000000      
##         x           0.015276   NaN

I wanted to see if we could speed things up (without losing accuracy) by using the BOBYQA implementation from the nloptr package. It turns out that in order to do this, we have to skip the nAGQ=0 init step (this is sometimes needed for cloglog models, especially those with offsets):

t3 <- system.time(fit3 <- update(fit1,control=glmerControl(optimizer="nloptwrap",
                                                   nAGQ0initStep=FALSE)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

Unfortunately, skipping the init step gives a negative log-likelihood that is -74 log-likelihood units worse (!).

I tried nAGQ0initStep=FALSE with the default optimizer or the BOBYQA optimizer as well; it was very very slow (\(\approx 600\) seconds with maxfun=5e4, which wasn’t even enough …)

t4 <- system.time(fit4 <- update(fit1,
                                control=glmerControl(optimizer="bobyqa",
                                                     optCtrl=list(maxfun=1e5),
                                                     nAGQ0initStep=FALSE)))

What about the opposite (nAGQ=0), as suggested by Tony Ives?

t5 <- system.time(fit5 <- update(fit1,nAGQ=0,control=glmerControl(optimizer="bobyqa",
                                                                  optCtrl=list(maxfun=3e4))))

Can’t use nAGQ > 1 in this case because we have a random-slopes model … too bad. (We have a mean value of effective sample size = min(alive,dead) 3, which implies that the Gauss-Hermite quadrature corrections should be important, but …)

We can’t really compare the likelihoods with nAGQ=0 vs. nAGQ>0, so we’ll just compare fixed-effect estimates (below).

t6 <- system.time(fit6 <- glmmTMB(Dead/(Alive+Dead) ~ (Trt + 0)/x + (x | Rep),
                                  weights = Alive+Dead,
                                  family=binomial(link="cloglog"),data=X))
t7 <- system.time(fit7 <- update(fit1,control=glmerControl(optimizer="bobyqa",
                                                           optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 3 negative
## eigenvalues

julia-lang

This is actually almost working, with a few caveats:

using DataFrames, GLM, MixedModels
const df = readtable("artSim.csv", makefactors = true);
## compute proportions and totals
df[:Tot] = float.(df[:Dead]+df[:Alive]);
df[:prop] = df[:Dead] ./ df[:Tot];
## for CPU timing:
### Pkg.clone("https://github.com/schmrlng/CPUTime.jl.git")
## fit without timing, for compilation
## (Trt+0)/x formula notation doesn't work, need to expand it ourselves
g3 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
    Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true);
e1 = @elapsed fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
    Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true)
## non-fast (nAGQ>0)
e2 = @elapsed g4 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
    Binomial(), CloglogLink(), wt=Array(df[:Tot])))
## store timing info
open("julia_cloglog_timings.txt","w") do f
    write(f,string(e1)," ",string(e2),"\n")
end
## store fixed-effect results
ct = coeftable(g4)
outdf = DataFrame(variable = ct.rownms,
                 Estimate = ct.cols[1],
               StdError = ct.cols[2],
                z_val = ct.cols[3])
writetable("julia_cloglog_coefs.csv",outdf)
ct = coeftable(g3)
outdf = DataFrame(variable = ct.rownms,
                 Estimate = ct.cols[1],
               StdError = ct.cols[2],
                z_val = ct.cols[3])
writetable("julia_cloglog_coefs_nagq0.csv",outdf)

brms

This brms fit works (sort of); it runs, but we get the following warnings:

1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 4000 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
t8 <- system.time(
    fit8 <- brm(Dead | trials(Dead+Alive) ~ (Trt + 0)/x + (x | Rep),
                family=binomial(link="cloglog"),
                inits="0",
                data=X))

Might work if we added some kind of priors (or group priors, i.e. shrinkage) on the fixed effects?

results summary

Or perhaps more clearly: