library(RTMB)
library(reformulas) ## for setting up Z matrix (mkReTrms and *bars() functions)
library(Matrix) ## for t(sparse_matrix)GLMMs in RTMB
An illustration of a moderately general/customizable framework for building your own GLMMs in RTMB.
data("sleepstudy", package = "lme4")
form <- round(Reaction) ~ Days + (Days | Subject)I’m rounding the response variable because I want to illustrate the use of a non-Gaussian response (negative binomial) [although that doesn’t necessarily make practical sense].
1fr <- model.frame(subbars(form), sleepstudy)
2lf <- mkReTrms(findbars(form), fr = fr, calc.lambdat = FALSE)
3X <- model.matrix(nobars(form), sleepstudy)
4fam <- poisson()
5us <- unstructured(2)- 1
-
reformulas::subbars()replaces any bars (|) with+in the formula so that it can be interpreted bymodel.frame()— all variables that appear in the formula will be included in the model frame, including expansion of data-dependent terms like splines. - 2
-
reformulas::mkReTrms()generates the required \(Z\) matrix (actually its transpose), as theZtelement of a list. If there are multiple random effect terms,Ztwill include them all concatenated into a single matrix, whileZtlistwill include them as separate elements in a list. The returned value includes other information (see the docs), butZtis the only component we need in this example.calc.lambdat = FALSEspecifies thatmkReTrmsshould not return the Cholesky factor of the covariance matrix of the (combined) random effects (we don’t need it and it is sometimes a large object). - 3
-
This constructs the model matrix for the fixed-effects component, in the usual way (
reformulas::nobarsdrops the random-effect terms). - 4
- This may be useful (but turns out not to be in this example, see below) for getting terms such as the link function, inverse link function, (scaled) variance function, etc. for the conditional distribution.
- 5
-
This returns a list that includes a function
corrfor computing an unstructured correlation matrix from a vector of parameters (see glmmTMB documentation or TMB documentation for more info). For a vector-valued random effect of length only two (i.e. {intercept, slope}) this approach is overkill because there is only a single correlation parameter, but it will generalize to larger correlation matrices.
Now the negative log-likelihood function itself:
nll <- function(pars) {
1 getAll(fr, lf, pars)
2 eta <- drop(X %*% beta + t(Zt) %*% c(t(b)))
3 mu <- exp(eta)
4 var <- mu*(1+mu/exp(lognbdisp))
5 nll <- -sum(dnbinom2(model.response(fr), mu, var, log = TRUE))
6 sdvec_b <- exp(logsd_b)
7 Sigma <- outer(sdvec_b, sdvec_b, "*") * us$corr(corpars)
8 nll <- nll - sum(dmvnorm(b, Sigma = Sigma, log = TRUE))
return(nll)
}- 1
-
getAll()is theRTMBversion ofattach()orwith(); it extracts the elements of all of these lists for use within the function. - 2
-
This is the standard \(X \beta + Z b\) framework for calculating the linear predictor of a mixed model. We have to (1) transpose
Zt(becausemkReTrmsreturns the transposed \(Z\) matrix, for Reasons); (2) transposeband usec()to collapse it to a vector;bis stored as a matrix with each row corresponding to one subject, and we need to unpack it rowwise because that corresponds to howmkReTrmsis setting upZt… (3) usedrop()to convert the resulting 1-row matrix to a vector. The dimension doesn’t get automatically dropped becauseZtis a sparse matrix, so the usual R auto-drop rules don’t apply. - 3
-
Applying the inverse-link function (we’re using the standard log link for the negative binomial). It would be nice to use
poisson()$linkinv, but that has apmax()statement in it (to prevent underflow) that breaks automatic differentiation. (The better solution here would be to use dnbinom_robust from (R)TMB, which lets us specify the negative binomial via(log_mu, log_var_minus_mu)without ever explicitly applying the inverse-link function and risking underflow … - 4
- Computing the negative binomial variance using the standard \(V = \mu (1+\mu/\theta)\) formula. (We use a log link for the dispersion parameter for general robustness and to avoid having to do constrained optimization.)
- 5
-
This line computes the conditional log-likelihood. Since we know that the response variable is
round(Reaction)we could have used that instead (breaking differentiability is OK as long as the computation doesn’t depend in any way on model parameters), butmodel.response(fr)is a little bit more general. (Don’t ask me how this works with binomial responses specified as a two-column matrix, please.) TMB usesdnbinomto specify the default base-R probability/size parameterization (which we rarely want) anddnbinom2to specify the alternative, and more generally useful, mean/variance parameterization. If you want some other parameterization such asnbinom1fromglmmTMByou have to code the parameter transformation yourself. - 6
- Convert the vector of random effect log-standard deviations (length-2 in this case) to standard deviations (we could do this on the fly in the next line, but this is more readable).
- 7
-
Compute the random effects covariance matrix from standard deviations and correlation parameters. We could equivalently use
diag(sdvec) %*% cormat %*% diag(sdvec)to do this transformation - 8
-
Compute the log-likelihood of \(b\) conditional on \(\Sigma\) (given a matrix-valued first argument,
dmvnorm()returns a vector of (log-)likelihoods, one for each row/subject); subtract it from the conditional log-likelihood we already computed.
Set up the parameters. These are ‘reasonable’ values for this problem.
nsubj <- length(unique(sleepstudy$Subject))
pars <- list(beta = c(5, 0.01),
1 b = matrix(0, ncol = 2, nrow = nsubj),
logsd_b = c(0,0),
lognbdisp = 0,
corpars = 0)- 1
- As mentioned previously, the random-effect values (BLUPs/conditional modes) are specified as a matrix with each row corresponding to the intercept and slope offsets for a particular subject.
Test the R function (when writing the code I did a bunch of debugging at this stage to get everything working …)
nll(pars)class='advector'
[1] 1286.507
This is reported as class='advector' because we used RTMB-specific functions (dnbinom2, mvgauss) in the function. Don’t try to figure out what’s inside this object, it will hurt your brain.
Convert the R function to an RTMB object and test it. The RTMB object is a list that contains (among other things) an objective function (fn), a gradient function (gr), and the starting parameters we specified (par). By default the functions use the last parameter values evaluated, or the starting parameters the first time we call the functions. (When doing more complicated stuff with RTMB objects, you have to be aware that they have a lot of internal mutable state; your results may change based on the results of previous function calls. When in doubt, re-run MakeADFun() to get a fresh copy.)
Since we haven’t specified any of the parameters as being random variables, this should give the same answer as the pure-R function. (The only problem I had to fix at this point was replacing the non-differentiable poisson()$linkinv() function with exp().)
rnll <- MakeADFun(nll, pars)
rnll$fn()[1] 1286.507
Now specify that b should be treated as a random effect. silent = TRUE turns off a lot of messages about intermediate steps of the Laplace approximation calculation. Now the negative log-likelihood should be lower (better goodness of fit), because the b parameters are automatically optimized as part of the Laplace approximation step.
rnll2 <- MakeADFun(nll, pars, random = "b", silent = TRUE)
rnll2$fn()[1] 1269.132
attr(,"logarithm")
[1] TRUE
Now feed these to nlminb (or any optimizer of your choice). You can use any optimizer here, but using an optimizer that doesn’t take advantage of gradient information would miss out on some of the main benefits of using (R)TMB.
fit <- with(rnll2, nlminb(start = par, objective = fn, gradient = gr))Warning in nlminb(start = par, objective = fn, gradient = gr): NA/NaN function
evaluation
print(fit)$par
beta beta logsd_b logsd_b lognbdisp corpars
5.53260603 0.03385786 -2.30321937 -4.05760181 5.65569151 -0.01471712
$objective
[1] 868.0835
$convergence
[1] 0
$iterations
[1] 30
$evaluations
function gradient
51 31
$message
[1] "relative convergence (4)"
We use with() to make the code slightly nicer (we don’t have to repeat rnll2$ in front of par, fn, and gr). The warning is (mostly) harmless; it means that an NaN value was encountered somewhere during the fitting process. It would be nice to dig in and robustify the code so this didn’t happen, but at the moment that’s more trouble than it’s worth. The fit results only give us information about the ‘top-level’ parameters (fixed effects and dispersion parameters). To get information on the conditional modes, etc. etc., we have to do more work …
Note that, as promised, calling the objective function with no arguments now gives us the last value evaluated, which is the same as the optimum returned by nlminb.
rnll2$fn()[1] 868.0835
attr(,"logarithm")
[1] TRUE
There are lots more details and tricks involving predictions, reported derived quantities and their standard deviations, etc., but that’s probably enough for now. Some of the other things we could do:
- specify multiple random effects terms; while
lme4andglmmTMBconcatenate the random effects terms to get one big \(Z\) matrix and one big \(\Sigma\) matrix, for generality, it’s probably easier for bespoke models to handle them one at a time - specify different covariance structures (see RTMB documentation).
RTMBhasdmvnorm(), with which you can specify whatever covariance matrix you want, parameterized however you want;dgmrf()(Gauss-Markov random field), if you want to specify an inverse covariance (precision) matrix. (This works well for some problems where this matrix, often denoted as \(Q\), can be computed directly as a sparse matrix. If you have the covariance matrix already, there may not be much advantage to inverting it in R to usedgmrf()instead ofdmvnorm(); RTMB will automatically cache the results of any computations that don’t depend on the values of the model parameters.)dseparable(), for specifying separable covariance structures (i.e. the covariance is the Kronecker product of any number of component covariances)
- specify other conditional distributions. In principle you can use any distribution defined in R; in practice everything will work better if you choose from one of the many distributions defined in TMB.
- use any nonlinear functions you like in your model definition (as long as they’re differentiable)
- export smooth terms from the
mgcvpackage and use them in your model - add zero-inflation terms … have to set up the zero-inflated log-likelihood yourself, e.g. see
emdbook::dzinbinomor Bolker (2008); again, theif (x==0)component in this calculation isn’t a problem because the conditional depends only on data, not on parameter values - add fixed and random effects for zero-inflation (usually with a logit link) or the dispersion (usually with a log link)
- in principle you could use factor-analytic terms (i.e., as in
rr()inglmmTMB), although in practice these involve additional complexity for (e.g.) picking adequate starting values … - compute downstream quantities like the Hessian (
optimHess(fit$par, rnll2$fn, rnll2$gr)), covariance matrix (usesolve()to invert the Hessian), likelihood profiles (TMB::tmbprofile()), etc etc. UseOBS()to enable simulations and quantile residuals,ADREPORT()to compute standard deviations of derived quantities. The RTMB introduction is a good place to start. - add priors/regularizing terms (by computing log-priors and subtracting them from the
nllvalue) - instead of using link functions to transform (e.g.) standard deviations to an unconstrained scale, use box-constrained optimization to set bounds on this type of parameters (RE standard deviations, dispersion parameters, correlation parameters, zero-inflation parameters, etc.). This may behave better in the case where the parameters converge to the boundaries (e.g. parameterize the negative binomial as \(V = \mu(1+\psi \mu)\) and use the
lowerargument tonlminbto enforce the constraint \(\psi \ge 0\))