GLMMs in RTMB

Author

Ben Bolker

Published

August 20, 2024

An illustration of a moderately general/customizable framework for building your own GLMMs in RTMB.

library(RTMB)
library(reformulas) ## for setting up Z matrix (mkReTrms and *bars() functions)
library(Matrix)     ## for t(sparse_matrix)

data("sleepstudy", package = "lme4")
form <- round(Reaction) ~ Days + (Days | Subject)

I’m rounding the response variable because I want to illustrate the use of a non-Gaussian response (negative binomial) [although that doesn’t necessarily make practical sense].

1fr <- model.frame(subbars(form), sleepstudy)
2lf <- mkReTrms(findbars(form), fr = fr, calc.lambdat = FALSE)
3X <- model.matrix(nobars(form), sleepstudy)
4fam <- poisson()
5us <- unstructured(2)

1: reformulas::subbars() replaces any bars (|) with + in the formula so that it can be interpreted by model.frame() — all variables that appear in the formula will be included in the model frame, including expansion of data-dependent terms like splines.
2: reformulas::mkReTrms() generates the required $Z$ matrix (actually its transpose), as the Zt element of a list. If there are multiple random effect terms, Zt will include them all concatenated into a single matrix, while Ztlist will include them as separate elements in a list. The returned value includes other information (see the docs), but Zt is the only component we need in this example. calc.lambdat = FALSE specifies that mkReTrms should not return the Cholesky factor of the covariance matrix of the (combined) random effects (we don’t need it and it is sometimes a large object).
3: This constructs the model matrix for the fixed-effects component, in the usual way (reformulas::nobars drops the random-effect terms).
4: This may be useful (but turns out not to be in this example, see below) for getting terms such as the link function, inverse link function, (scaled) variance function, etc. for the conditional distribution.
5: This returns a list that includes a function corr for computing an unstructured correlation matrix from a vector of parameters (see glmmTMB documentation or TMB documentation for more info). For a vector-valued random effect of length only two (i.e. {intercept, slope}) this approach is overkill because there is only a single correlation parameter, but it will generalize to larger correlation matrices.

Now the negative log-likelihood function itself:

nll <- function(pars) {
1    getAll(fr, lf, pars)
2    eta <- drop(X %*% beta + t(Zt) %*% c(t(b)))
3    mu <- exp(eta)
4    var <- mu*(1+mu/exp(lognbdisp))
5    nll <- -sum(dnbinom2(model.response(fr), mu, var, log = TRUE))
6    sdvec_b <- exp(logsd_b)
7    Sigma <- outer(sdvec_b, sdvec_b, "*") * us$corr(corpars)
8    nll <- nll - sum(dmvnorm(b, Sigma = Sigma, log = TRUE))
    return(nll)
}

1: getAll() is the RTMB version of attach() or with(); it extracts the elements of all of these lists for use within the function.
2: This is the standard $X \beta + Z b$ framework for calculating the linear predictor of a mixed model. We have to (1) transpose Zt (because mkReTrms returns the transposed $Z$ matrix, for Reasons); (2) transpose b and use c() to collapse it to a vector; b is stored as a matrix with each row corresponding to one subject, and we need to unpack it rowwise because that corresponds to how mkReTrms is setting up Zt … (3) use drop() to convert the resulting 1-row matrix to a vector. The dimension doesn’t get automatically dropped because Zt is a sparse matrix, so the usual R auto-drop rules don’t apply.
3: Applying the inverse-link function (we’re using the standard log link for the negative binomial). It would be nice to use poisson()$linkinv, but that has a pmax() statement in it (to prevent underflow) that breaks automatic differentiation. (The better solution here would be to use dnbinom_robust from (R)TMB, which lets us specify the negative binomial via (log_mu, log_var_minus_mu) without ever explicitly applying the inverse-link function and risking underflow …
4: Computing the negative binomial variance using the standard $V = \mu (1+\mu/\theta)$ formula. (We use a log link for the dispersion parameter for general robustness and to avoid having to do constrained optimization.)
5: This line computes the conditional log-likelihood. Since we know that the response variable is round(Reaction) we could have used that instead (breaking differentiability is OK as long as the computation doesn’t depend in any way on model parameters), but model.response(fr) is a little bit more general. (Don’t ask me how this works with binomial responses specified as a two-column matrix, please.) TMB uses dnbinom to specify the default base-R probability/size parameterization (which we rarely want) and dnbinom2 to specify the alternative, and more generally useful, mean/variance parameterization. If you want some other parameterization such as nbinom1 from glmmTMB you have to code the parameter transformation yourself.
6: Convert the vector of random effect log-standard deviations (length-2 in this case) to standard deviations (we could do this on the fly in the next line, but this is more readable).
7: Compute the random effects covariance matrix from standard deviations and correlation parameters. We could equivalently use diag(sdvec) %*% cormat %*% diag(sdvec) to do this transformation
8: Compute the log-likelihood of $b$ conditional on $\Sigma$ (given a matrix-valued first argument, dmvnorm() returns a vector of (log-)likelihoods, one for each row/subject); subtract it from the conditional log-likelihood we already computed.

Set up the parameters. These are ‘reasonable’ values for this problem.

nsubj <- length(unique(sleepstudy$Subject))
pars <- list(beta = c(5, 0.01),  
1             b = matrix(0, ncol = 2, nrow = nsubj),
             logsd_b = c(0,0),
             lognbdisp = 0,
             corpars = 0)

1: As mentioned previously, the random-effect values (BLUPs/conditional modes) are specified as a matrix with each row corresponding to the intercept and slope offsets for a particular subject.

Test the R function (when writing the code I did a bunch of debugging at this stage to get everything working …)

nll(pars)

class='advector'
[1] 1286.507

This is reported as class='advector' because we used RTMB-specific functions (dnbinom2, mvgauss) in the function. Don’t try to figure out what’s inside this object, it will hurt your brain.

Convert the R function to an RTMB object and test it. The RTMB object is a list that contains (among other things) an objective function (fn), a gradient function (gr), and the starting parameters we specified (par). By default the functions use the last parameter values evaluated, or the starting parameters the first time we call the functions. (When doing more complicated stuff with RTMB objects, you have to be aware that they have a lot of internal mutable state; your results may change based on the results of previous function calls. When in doubt, re-run MakeADFun() to get a fresh copy.)

Since we haven’t specified any of the parameters as being random variables, this should give the same answer as the pure-R function. (The only problem I had to fix at this point was replacing the non-differentiable poisson()$linkinv() function with exp().)

rnll <- MakeADFun(nll, pars)
rnll$fn()

[1] 1286.507

Now specify that b should be treated as a random effect. silent = TRUE turns off a lot of messages about intermediate steps of the Laplace approximation calculation. Now the negative log-likelihood should be lower (better goodness of fit), because the b parameters are automatically optimized as part of the Laplace approximation step.

rnll2 <- MakeADFun(nll, pars, random = "b", silent = TRUE)
rnll2$fn()

[1] 1269.132
attr(,"logarithm")
[1] TRUE

Now feed these to nlminb (or any optimizer of your choice). You can use any optimizer here, but using an optimizer that doesn’t take advantage of gradient information would miss out on some of the main benefits of using (R)TMB.

fit <- with(rnll2, nlminb(start = par, objective = fn, gradient = gr))

Warning in nlminb(start = par, objective = fn, gradient = gr): NA/NaN function
evaluation

print(fit)

$par
       beta        beta     logsd_b     logsd_b   lognbdisp     corpars 
 5.53260603  0.03385786 -2.30321937 -4.05760181  5.65569151 -0.01471712 

$objective
[1] 868.0835

$convergence
[1] 0

$iterations
[1] 30

$evaluations
function gradient 
      51       31 

$message
[1] "relative convergence (4)"

We use with() to make the code slightly nicer (we don’t have to repeat rnll2$ in front of par, fn, and gr). The warning is (mostly) harmless; it means that an NaN value was encountered somewhere during the fitting process. It would be nice to dig in and robustify the code so this didn’t happen, but at the moment that’s more trouble than it’s worth. The fit results only give us information about the ‘top-level’ parameters (fixed effects and dispersion parameters). To get information on the conditional modes, etc. etc., we have to do more work …

Note that, as promised, calling the objective function with no arguments now gives us the last value evaluated, which is the same as the optimum returned by nlminb.

rnll2$fn()

[1] 868.0835
attr(,"logarithm")
[1] TRUE

There are lots more details and tricks involving predictions, reported derived quantities and their standard deviations, etc., but that’s probably enough for now. Some of the other things we could do:

specify multiple random effects terms; while lme4 and glmmTMB concatenate the random effects terms to get one big $Z$ matrix and one big $\Sigma$ matrix, for generality, it’s probably easier for bespoke models to handle them one at a time
specify different covariance structures (see RTMB documentation). RTMB has
- dmvnorm(), with which you can specify whatever covariance matrix you want, parameterized however you want;
- dgmrf() (Gauss-Markov random field), if you want to specify an inverse covariance (precision) matrix. (This works well for some problems where this matrix, often denoted as $Q$, can be computed directly as a sparse matrix. If you have the covariance matrix already, there may not be much advantage to inverting it in R to use dgmrf() instead of dmvnorm(); RTMB will automatically cache the results of any computations that don’t depend on the values of the model parameters.)
- dseparable(), for specifying separable covariance structures (i.e. the covariance is the Kronecker product of any number of component covariances)
specify other conditional distributions. In principle you can use any distribution defined in R; in practice everything will work better if you choose from one of the many distributions defined in TMB.
use any nonlinear functions you like in your model definition (as long as they’re differentiable)
export smooth terms from the mgcv package and use them in your model
add zero-inflation terms … have to set up the zero-inflated log-likelihood yourself, e.g. see emdbook::dzinbinom or Bolker (2008); again, the if (x==0) component in this calculation isn’t a problem because the conditional depends only on data, not on parameter values
add fixed and random effects for zero-inflation (usually with a logit link) or the dispersion (usually with a log link)
in principle you could use factor-analytic terms (i.e., as in rr() in glmmTMB), although in practice these involve additional complexity for (e.g.) picking adequate starting values …
compute downstream quantities like the Hessian (optimHess(fit$par, rnll2$fn, rnll2$gr)), covariance matrix (use solve() to invert the Hessian), likelihood profiles (TMB::tmbprofile()), etc etc. Use OBS() to enable simulations and quantile residuals, ADREPORT() to compute standard deviations of derived quantities. The RTMB introduction is a good place to start.
add priors/regularizing terms (by computing log-priors and subtracting them from the nll value)
instead of using link functions to transform (e.g.) standard deviations to an unconstrained scale, use box-constrained optimization to set bounds on this type of parameters (RE standard deviations, dispersion parameters, correlation parameters, zero-inflation parameters, etc.). This may behave better in the case where the parameters converge to the boundaries (e.g. parameterize the negative binomial as $V = \mu(1+\psi \mu)$ and use the lower argument to nlminb to enforce the constraint $\psi \ge 0$)