Responding to a query from Rolf Turner on r-sig-mixed-models@r-project.org.

tl;dr the error messages are spurious/due to a singular fit. Considerable speed-ups can be achieved by (1) switching optimizers, (2) using nAGQ=0, (3) using glmmTMB or Julia …

to do

• compare variance-covariance matrix estimates (they vary a lot, although it doesn’t seem to have much effect on the fixed effects?
• play with/fight with Julia more (native mode rather than rjulia package? better retrieval of results? try fast! (corresponding to nAGQ=0)?
• worth comparing with brms results … ??
• try estimation with treatment and treatment-by-x effects either as real random effects (i.e., with shrinkage) or with large fixed variance (for computational efficiency)

caveats

• Julia timing is still questionable; I have previously gotten results as fast as 7 seconds for the full model fit (vs. the 36 seconds reported below)
## fitting packages
library(lme4)
library(glmmTMB)
library(brms)
## post-analysis pkgs
library(broom)  ## need BMB version for glmmTMB
library(ggplot2); theme_set(theme_bw())
library(MASS) ## before dplyr; avoid masking dplyr::select()
library(dplyr)
library(tidyr)
library(ggstance)

To run the code in the Julia section below you need to install:

• Julia; tested on versions 0.5.2 and 0.6.0, I used 0.6.0 below. On Ubuntu you can

sudo add-apt-repository ppa:staticfloat/juliareleases
sudo apt-get update
sudo apt-get install julia
• Doug Bates’s MixedModels.jl package (in Julia, Pkg.add("MixedModels"))
• at present, an updated version of the GLM package for Julia that fixes an issue with the cloglog link (in Julia, Pkg.checkout("GLM"); Pkg.checkout("GLM", "db/clampcloglog")
• this example no longer uses the rjulia package for R. It did in previous versions; I needed to create a modified version (in R, devtools::install_github("bbolker/rjulia", ref="julia0.5")

set.seed(42)
X <- artSim()
write.csv(X,file="artSim.csv",row.names=FALSE)
t1 <- system.time(fit1 <- glmer(cbind(Dead,Alive) ~ (Trt + 0)/x + (x | Rep),
data=X))
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

We should deal with the “failure to converge in 10000 evaluations” warning first: this comes directly from the optimizer (minqa::bobyqa), and hence means we don’t think we’ve even gotten to the optimum yet. We need to use the optCtrl component of the glmerControl() function to adjust the maximum number of iterations.

t2 <- system.time(fit2 <- update(fit1,control=glmerControl(optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

Now we have just the post-fitting convergence warnings. The first thing we should do is to check for singularity: if the fit is singular, i.e. some of the $$\theta$$ (variance-covariance) parameters are at their constrained values, then the warning about negative eigenvalues of the Hessian is largely irrelevant (it is a long-term wish-list item to evaluate the Karush-Kuhn-Tucker conditions properly in this case, i.e. allowing for the constraints.

is.singular <- function(x,tol=1e-6) any(abs(getME(x,"theta"))<tol)
is.singular(fit2)
## [1] TRUE

Furthermore, in this case we can more simply look a the variance-covariance matrix and see that it has a zero variance (and an undefined correlation), which more directly indicates singularity.

VarCorr(fit2)
##  Groups Name        Std.Dev. Corr
##  Rep    (Intercept) 0.000000
##         x           0.015276   NaN

I wanted to see if we could speed things up (without losing accuracy) by using the BOBYQA implementation from the nloptr package. It turns out that in order to do this, we have to skip the nAGQ=0 init step (this is sometimes needed for cloglog models, especially those with offsets):

t3 <- system.time(fit3 <- update(fit1,control=glmerControl(optimizer="nloptwrap",
nAGQ0initStep=FALSE)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues

Unfortunately, skipping the init step gives a negative log-likelihood that is -74 log-likelihood units worse (!).

I tried nAGQ0initStep=FALSE with the default optimizer or the BOBYQA optimizer as well; it was very very slow ($$\approx 600$$ seconds with maxfun=5e4, which wasn’t even enough …)

t4 <- system.time(fit4 <- update(fit1,
control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=1e5),
nAGQ0initStep=FALSE)))

What about the opposite (nAGQ=0), as suggested by Tony Ives?

t5 <- system.time(fit5 <- update(fit1,nAGQ=0,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=3e4))))

Can’t use nAGQ > 1 in this case because we have a random-slopes model … too bad. (We have a mean value of effective sample size = min(alive,dead) 3, which implies that the Gauss-Hermite quadrature corrections should be important, but …)

We can’t really compare the likelihoods with nAGQ=0 vs. nAGQ>0, so we’ll just compare fixed-effect estimates (below).

t6 <- system.time(fit6 <- glmmTMB(Dead/(Alive+Dead) ~ (Trt + 0)/x + (x | Rep),
family=binomial(link="cloglog"),data=X))
t7 <- system.time(fit7 <- update(fit1,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control ##$checkConv, : Model failed to converge: degenerate Hessian with 3 negative
## eigenvalues

julia-lang

This is actually almost working, with a few caveats:

• I’m having trouble getting Julia to evaluate properly in an Rmarkdown chunk (can probably do this but will have to fight with it a bit)
• the runr-based strategy implemented here is flaky
• I could hack together something simpler based on sending text to Julia for evaluation (rjulia -e? -E?) but not sure how well it would work
• rjulia works (if hacked as described above)
• I fell back on simply running the Julia code (below, and here) as a separate process and printing the results to a file
• requires fix to Julia GLM internals so cloglog doesn’t underflow
using DataFrames, GLM, MixedModels
const df = readtable("artSim.csv", makefactors = true);
## compute proportions and totals
## for CPU timing:
### Pkg.clone("https://github.com/schmrlng/CPUTime.jl.git")
## fit without timing, for compilation
## (Trt+0)/x formula notation doesn't work, need to expand it ourselves
g3 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true);
e1 = @elapsed fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true)
## non-fast (nAGQ>0)
e2 = @elapsed g4 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
## store timing info
open("julia_cloglog_timings.txt","w") do f
write(f,string(e1)," ",string(e2),"\n")
end
## store fixed-effect results
ct = coeftable(g4)
outdf = DataFrame(variable = ct.rownms,
Estimate = ct.cols[1],
StdError = ct.cols[2],
z_val = ct.cols[3])
writetable("julia_cloglog_coefs.csv",outdf)
ct = coeftable(g3)
outdf = DataFrame(variable = ct.rownms,
Estimate = ct.cols[1],
StdError = ct.cols[2],
z_val = ct.cols[3])
writetable("julia_cloglog_coefs_nagq0.csv",outdf)

brms

This brms fit works (sort of); it runs, but we get the following warnings:

1: The model has not converged (some Rhats are > 1.1). Do not analyse the results!
We recommend running more iterations and/or setting stronger priors.
2: There were 4000 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
t8 <- system.time(
fit8 <- brm(Dead | trials(Dead+Alive) ~ (Trt + 0)/x + (x | Rep),
data=X))