Responding to a query from Rolf Turner on r-sig-mixed-models@r-project.org
.
tl;dr the error messages are spurious/due to a singular fit. Considerable speed-ups can be achieved by (1) switching optimizers, (2) using nAGQ=0
, (3) using glmmTMB
or Julia …
to do
rjulia
package? better retrieval of results? try fast!
(corresponding to nAGQ=0
)?brms
results … ??caveats
## fitting packages
library(lme4)
library(glmmTMB)
library(brms)
## post-analysis pkgs
library(broom) ## need BMB version for glmmTMB
library(ggplot2); theme_set(theme_bw())
library(MASS) ## before dplyr; avoid masking dplyr::select()
library(dplyr)
library(tidyr)
library(ggstance)
To run the code in the Julia section below you need to install:
Julia; tested on versions 0.5.2 and 0.6.0, I used 0.6.0 below. On Ubuntu you can
sudo add-apt-repository ppa:staticfloat/juliareleases
sudo apt-get update
sudo apt-get install julia
Pkg.add("MixedModels")
)GLM
package for Julia that fixes an issue with the cloglog link (in Julia, Pkg.checkout("GLM"); Pkg.checkout("GLM", "db/clampcloglog")
this example no longer uses the rjulia
package for R. It did in previous versions; I needed to create a modified version (in R, devtools::install_github("bbolker/rjulia", ref="julia0.5")
set.seed(42)
X <- artSim()
write.csv(X,file="artSim.csv",row.names=FALSE)
t1 <- system.time(fit1 <- glmer(cbind(Dead,Alive) ~ (Trt + 0)/x + (x | Rep),
family=binomial(link="cloglog"),
data=X))
## Warning in (function (fn, par, lower = rep.int(-Inf, n), upper =
## rep.int(Inf, : failure to converge in 10000 evaluations
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
We should deal with the “failure to converge in 10000 evaluations” warning first: this comes directly from the optimizer (minqa::bobyqa
), and hence means we don’t think we’ve even gotten to the optimum yet. We need to use the optCtrl
component of the glmerControl()
function to adjust the maximum number of iterations.
t2 <- system.time(fit2 <- update(fit1,control=glmerControl(optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
Now we have just the post-fitting convergence warnings. The first thing we should do is to check for singularity: if the fit is singular, i.e. some of the \(\theta\) (variance-covariance) parameters are at their constrained values, then the warning about negative eigenvalues of the Hessian is largely irrelevant (it is a long-term wish-list item to evaluate the Karush-Kuhn-Tucker conditions properly in this case, i.e. allowing for the constraints.
is.singular <- function(x,tol=1e-6) any(abs(getME(x,"theta"))<tol)
is.singular(fit2)
## [1] TRUE
Furthermore, in this case we can more simply look a the variance-covariance matrix and see that it has a zero variance (and an undefined correlation), which more directly indicates singularity.
VarCorr(fit2)
## Groups Name Std.Dev. Corr
## Rep (Intercept) 0.000000
## x 0.015276 NaN
I wanted to see if we could speed things up (without losing accuracy) by using the BOBYQA implementation from the nloptr
package. It turns out that in order to do this, we have to skip the nAGQ=0
init step (this is sometimes needed for cloglog models, especially those with offsets):
t3 <- system.time(fit3 <- update(fit1,control=glmerControl(optimizer="nloptwrap",
nAGQ0initStep=FALSE)))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 1 negative
## eigenvalues
Unfortunately, skipping the init step gives a negative log-likelihood that is -74 log-likelihood units worse (!).
I tried nAGQ0initStep=FALSE
with the default optimizer or the BOBYQA optimizer as well; it was very very slow (\(\approx 600\) seconds with maxfun=5e4
, which wasn’t even enough …)
t4 <- system.time(fit4 <- update(fit1,
control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=1e5),
nAGQ0initStep=FALSE)))
What about the opposite (nAGQ=0
), as suggested by Tony Ives?
t5 <- system.time(fit5 <- update(fit1,nAGQ=0,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=3e4))))
Can’t use nAGQ > 1
in this case because we have a random-slopes model … too bad. (We have a mean value of effective sample size = min(alive,dead)
3, which implies that the Gauss-Hermite quadrature corrections should be important, but …)
We can’t really compare the likelihoods with nAGQ=0
vs. nAGQ>0
, so we’ll just compare fixed-effect estimates (below).
t6 <- system.time(fit6 <- glmmTMB(Dead/(Alive+Dead) ~ (Trt + 0)/x + (x | Rep),
weights = Alive+Dead,
family=binomial(link="cloglog"),data=X))
t7 <- system.time(fit7 <- update(fit1,control=glmerControl(optimizer="bobyqa",
optCtrl=list(maxfun=3e4))))
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control
## $checkConv, : Model failed to converge: degenerate Hessian with 3 negative
## eigenvalues
This is actually almost working, with a few caveats:
runr
-based strategy implemented here is flakyrjulia -e
? -E
?) but not sure how well it would workrjulia
works (if hacked as described above)using DataFrames, GLM, MixedModels
const df = readtable("artSim.csv", makefactors = true);
## compute proportions and totals
df[:Tot] = float.(df[:Dead]+df[:Alive]);
df[:prop] = df[:Dead] ./ df[:Tot];
## for CPU timing:
### Pkg.clone("https://github.com/schmrlng/CPUTime.jl.git")
## fit without timing, for compilation
## (Trt+0)/x formula notation doesn't work, need to expand it ourselves
g3 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true);
e1 = @elapsed fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
Binomial(), CloglogLink(), wt=Array(df[:Tot])), fast = true)
## non-fast (nAGQ>0)
e2 = @elapsed g4 = fit!(glmm(@formula(prop ~ 0 + Trt + Trt&x + (1 + x|Rep)), df,
Binomial(), CloglogLink(), wt=Array(df[:Tot])))
## store timing info
open("julia_cloglog_timings.txt","w") do f
write(f,string(e1)," ",string(e2),"\n")
end
## store fixed-effect results
ct = coeftable(g4)
outdf = DataFrame(variable = ct.rownms,
Estimate = ct.cols[1],
StdError = ct.cols[2],
z_val = ct.cols[3])
writetable("julia_cloglog_coefs.csv",outdf)
ct = coeftable(g3)
outdf = DataFrame(variable = ct.rownms,
Estimate = ct.cols[1],
StdError = ct.cols[2],
z_val = ct.cols[3])
writetable("julia_cloglog_coefs_nagq0.csv",outdf)
This brms
fit works (sort of); it runs, but we get the following warnings:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results!
We recommend running more iterations and/or setting stronger priors.
2: There were 4000 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
t8 <- system.time(
fit8 <- brm(Dead | trials(Dead+Alive) ~ (Trt + 0)/x + (x | Rep),
family=binomial(link="cloglog"),
inits="0",
data=X))
Might work if we added some kind of priors (or group priors, i.e. shrinkage) on the fixed effects?
Or perhaps more clearly:
Check timings:
model | elapsed |
---|---|
glmer_nAGQ0 | 0.763 |
glmmTMB | 17.418 |
glmer_bobyqa | 38.037 |
glmer_nlopt_no0init | 82.309 |
glmer_std | 97.393 |
This is actually almost working, with two caveats:
runr
-based strategy implemented here sort of works, but is flaky. Could hack together something based on evaluation, or ?## {r juliafit,engine="julia",eval=FALSE,warning=FALSE}
jDo("using DataFrames")
jDo("using MixedModels")
jDo("using GLM") ## for testing
jDo('df = readtable("artSim.csv");')
## compute proportions and totals
jDo('df[:tot] = df[:Dead]+df[:Alive];')
jDo('df[:prop] = df[:Dead] ./ df[:tot];')
## convert Treatment to categorical
## https://stackoverflow.com/questions/43708635/indicator-matrix-for-categorical-data-in-glm-jl-with-dataframes-jl
jDo('pool!(df, [:Trt])')
## working up to it ....
## g1 = glm(@formula(prop ~ Trt+0), df, Binomial(), CloglogLink(), wt= Array(df[:tot]));
## g2 = glmm(@formula(prop ~ Trt+0 + (1 | Rep)), df, Binomial(), wt = float.(Array(df[:tot])));
## g3 = glmm(@formula(prop ~ Trt+0 + (x | Rep)), df, Binomial(), wt = float.(Array(df[:tot])));
## can't handle (a+b)/x syntax ... expand ???
## g4 = glmm(@formula(prop ~ (Trt+0)/x + (x | Rep)), df, Binomial(), wt = float.(Array(df[:tot])));
## glmm(@formula(prop ~ 0+Trt+Trt&x + (x | Rep)), df, Binomial(),
## CloglogLink(), wt = float.(Array(df[:tot])))
## ERROR: AssertionError: isfinite(crit)
##
jDo("@time g1 = glmm(@formula(prop ~ 0+Trt+Trt&x + (x | Rep)), df, Binomial(),
LogitLink(), wt = float.(Array(df[:tot])));")
ff <- j2r("fixef(g1)")
rr <- j2r("ranef(g1)") ## NULL ???