library(RTMB)
library(reformulas) ## for setting up Z matrix (mkReTrms and *bars() functions)
library(Matrix) ## for t(sparse_matrix)
GLMMs in RTMB
An illustration of a moderately general/customizable framework for building your own GLMMs in RTMB
.
data("sleepstudy", package = "lme4")
<- round(Reaction) ~ Days + (Days | Subject) form
I’m rounding the response variable because I want to illustrate the use of a non-Gaussian response (negative binomial) [although that doesn’t necessarily make practical sense].
1<- model.frame(subbars(form), sleepstudy)
fr 2<- mkReTrms(findbars(form), fr = fr, calc.lambdat = FALSE)
lf 3<- model.matrix(nobars(form), sleepstudy)
X 4<- poisson()
fam 5<- unstructured(2) us
- 1
-
reformulas::subbars()
replaces any bars (|
) with+
in the formula so that it can be interpreted bymodel.frame()
— all variables that appear in the formula will be included in the model frame, including expansion of data-dependent terms like splines. - 2
-
reformulas::mkReTrms()
generates the required \(Z\) matrix (actually its transpose), as theZt
element of a list. If there are multiple random effect terms,Zt
will include them all concatenated into a single matrix, whileZtlist
will include them as separate elements in a list. The returned value includes other information (see the docs), butZt
is the only component we need in this example.calc.lambdat = FALSE
specifies thatmkReTrms
should not return the Cholesky factor of the covariance matrix of the (combined) random effects (we don’t need it and it is sometimes a large object). - 3
-
This constructs the model matrix for the fixed-effects component, in the usual way (
reformulas::nobars
drops the random-effect terms). - 4
- This may be useful (but turns out not to be in this example, see below) for getting terms such as the link function, inverse link function, (scaled) variance function, etc. for the conditional distribution.
- 5
-
This returns a list that includes a function
corr
for computing an unstructured correlation matrix from a vector of parameters (see glmmTMB documentation or TMB documentation for more info). For a vector-valued random effect of length only two (i.e. {intercept, slope}) this approach is overkill because there is only a single correlation parameter, but it will generalize to larger correlation matrices.
Now the negative log-likelihood function itself:
<- function(pars) {
nll 1getAll(fr, lf, pars)
2<- drop(X %*% beta + t(Zt) %*% c(t(b)))
eta 3<- exp(eta)
mu 4<- mu*(1+mu/exp(lognbdisp))
var 5<- -sum(dnbinom2(model.response(fr), mu, var, log = TRUE))
nll 6<- exp(logsd_b)
sdvec_b 7<- outer(sdvec_b, sdvec_b, "*") * us$corr(corpars)
Sigma 8<- nll - sum(dmvnorm(b, Sigma = Sigma, log = TRUE))
nll return(nll)
}
- 1
-
getAll()
is theRTMB
version ofattach()
orwith()
; it extracts the elements of all of these lists for use within the function. - 2
-
This is the standard \(X \beta + Z b\) framework for calculating the linear predictor of a mixed model. We have to (1) transpose
Zt
(becausemkReTrms
returns the transposed \(Z\) matrix, for Reasons); (2) transposeb
and usec()
to collapse it to a vector;b
is stored as a matrix with each row corresponding to one subject, and we need to unpack it rowwise because that corresponds to howmkReTrms
is setting upZt
… (3) usedrop()
to convert the resulting 1-row matrix to a vector. The dimension doesn’t get automatically dropped becauseZt
is a sparse matrix, so the usual R auto-drop rules don’t apply. - 3
-
Applying the inverse-link function (we’re using the standard log link for the negative binomial). It would be nice to use
poisson()$linkinv
, but that has apmax()
statement in it (to prevent underflow) that breaks automatic differentiation. (The better solution here would be to use dnbinom_robust from (R)TMB, which lets us specify the negative binomial via(log_mu, log_var_minus_mu)
without ever explicitly applying the inverse-link function and risking underflow … - 4
- Computing the negative binomial variance using the standard \(V = \mu (1+\mu/\theta)\) formula. (We use a log link for the dispersion parameter for general robustness and to avoid having to do constrained optimization.)
- 5
-
This line computes the conditional log-likelihood. Since we know that the response variable is
round(Reaction)
we could have used that instead (breaking differentiability is OK as long as the computation doesn’t depend in any way on model parameters), butmodel.response(fr)
is a little bit more general. (Don’t ask me how this works with binomial responses specified as a two-column matrix, please.) TMB usesdnbinom
to specify the default base-R probability/size parameterization (which we rarely want) anddnbinom2
to specify the alternative, and more generally useful, mean/variance parameterization. If you want some other parameterization such asnbinom1
fromglmmTMB
you have to code the parameter transformation yourself. - 6
- Convert the vector of random effect log-standard deviations (length-2 in this case) to standard deviations (we could do this on the fly in the next line, but this is more readable).
- 7
-
Compute the random effects covariance matrix from standard deviations and correlation parameters. We could equivalently use
diag(sdvec) %*% cormat %*% diag(sdvec)
to do this transformation - 8
-
Compute the log-likelihood of \(b\) conditional on \(\Sigma\) (given a matrix-valued first argument,
dmvnorm()
returns a vector of (log-)likelihoods, one for each row/subject); subtract it from the conditional log-likelihood we already computed.
Set up the parameters. These are ‘reasonable’ values for this problem.
<- length(unique(sleepstudy$Subject))
nsubj <- list(beta = c(5, 0.01),
pars 1b = matrix(0, ncol = 2, nrow = nsubj),
logsd_b = c(0,0),
lognbdisp = 0,
corpars = 0)
- 1
- As mentioned previously, the random-effect values (BLUPs/conditional modes) are specified as a matrix with each row corresponding to the intercept and slope offsets for a particular subject.
Test the R function (when writing the code I did a bunch of debugging at this stage to get everything working …)
nll(pars)
class='advector'
[1] 1286.507
This is reported as class='advector'
because we used RTMB-specific functions (dnbinom2
, mvgauss
) in the function. Don’t try to figure out what’s inside this object, it will hurt your brain.
Convert the R function to an RTMB object and test it. The RTMB object is a list that contains (among other things) an objective function (fn
), a gradient function (gr
), and the starting parameters we specified (par
). By default the functions use the last parameter values evaluated, or the starting parameters the first time we call the functions. (When doing more complicated stuff with RTMB objects, you have to be aware that they have a lot of internal mutable state; your results may change based on the results of previous function calls. When in doubt, re-run MakeADFun()
to get a fresh copy.)
Since we haven’t specified any of the parameters as being random variables, this should give the same answer as the pure-R function. (The only problem I had to fix at this point was replacing the non-differentiable poisson()$linkinv()
function with exp()
.)
<- MakeADFun(nll, pars)
rnll $fn() rnll
[1] 1286.507
Now specify that b
should be treated as a random effect. silent = TRUE
turns off a lot of messages about intermediate steps of the Laplace approximation calculation. Now the negative log-likelihood should be lower (better goodness of fit), because the b
parameters are automatically optimized as part of the Laplace approximation step.
<- MakeADFun(nll, pars, random = "b", silent = TRUE)
rnll2 $fn() rnll2
[1] 1269.132
attr(,"logarithm")
[1] TRUE
Now feed these to nlminb
(or any optimizer of your choice). You can use any optimizer here, but using an optimizer that doesn’t take advantage of gradient information would miss out on some of the main benefits of using (R)TMB.
<- with(rnll2, nlminb(start = par, objective = fn, gradient = gr)) fit
Warning in nlminb(start = par, objective = fn, gradient = gr): NA/NaN function
evaluation
print(fit)
$par
beta beta logsd_b logsd_b lognbdisp corpars
5.53260603 0.03385786 -2.30321937 -4.05760181 5.65569151 -0.01471712
$objective
[1] 868.0835
$convergence
[1] 0
$iterations
[1] 30
$evaluations
function gradient
51 31
$message
[1] "relative convergence (4)"
We use with()
to make the code slightly nicer (we don’t have to repeat rnll2$
in front of par
, fn
, and gr
). The warning is (mostly) harmless; it means that an NaN
value was encountered somewhere during the fitting process. It would be nice to dig in and robustify the code so this didn’t happen, but at the moment that’s more trouble than it’s worth. The fit
results only give us information about the ‘top-level’ parameters (fixed effects and dispersion parameters). To get information on the conditional modes, etc. etc., we have to do more work …
Note that, as promised, calling the objective function with no arguments now gives us the last value evaluated, which is the same as the optimum returned by nlminb
.
$fn() rnll2
[1] 868.0835
attr(,"logarithm")
[1] TRUE
There are lots more details and tricks involving predictions, reported derived quantities and their standard deviations, etc., but that’s probably enough for now. Some of the other things we could do:
- specify multiple random effects terms; while
lme4
andglmmTMB
concatenate the random effects terms to get one big \(Z\) matrix and one big \(\Sigma\) matrix, for generality, it’s probably easier for bespoke models to handle them one at a time - specify different covariance structures (see RTMB documentation).
RTMB
hasdmvnorm()
, with which you can specify whatever covariance matrix you want, parameterized however you want;dgmrf()
(Gauss-Markov random field), if you want to specify an inverse covariance (precision) matrix. (This works well for some problems where this matrix, often denoted as \(Q\), can be computed directly as a sparse matrix. If you have the covariance matrix already, there may not be much advantage to inverting it in R to usedgmrf()
instead ofdmvnorm()
; RTMB will automatically cache the results of any computations that don’t depend on the values of the model parameters.)dseparable()
, for specifying separable covariance structures (i.e. the covariance is the Kronecker product of any number of component covariances)
- specify other conditional distributions. In principle you can use any distribution defined in R; in practice everything will work better if you choose from one of the many distributions defined in TMB.
- use any nonlinear functions you like in your model definition (as long as they’re differentiable)
- export smooth terms from the
mgcv
package and use them in your model - add zero-inflation terms … have to set up the zero-inflated log-likelihood yourself, e.g. see
emdbook::dzinbinom
or Bolker (2008); again, theif (x==0)
component in this calculation isn’t a problem because the conditional depends only on data, not on parameter values - add fixed and random effects for zero-inflation (usually with a logit link) or the dispersion (usually with a log link)
- in principle you could use factor-analytic terms (i.e., as in
rr()
inglmmTMB
), although in practice these involve additional complexity for (e.g.) picking adequate starting values … - compute downstream quantities like the Hessian (
optimHess(fit$par, rnll2$fn, rnll2$gr)
), covariance matrix (usesolve()
to invert the Hessian), likelihood profiles (TMB::tmbprofile()
), etc etc. UseOBS()
to enable simulations and quantile residuals,ADREPORT()
to compute standard deviations of derived quantities. The RTMB introduction is a good place to start. - add priors/regularizing terms (by computing log-priors and subtracting them from the
nll
value) - instead of using link functions to transform (e.g.) standard deviations to an unconstrained scale, use box-constrained optimization to set bounds on this type of parameters (RE standard deviations, dispersion parameters, correlation parameters, zero-inflation parameters, etc.). This may behave better in the case where the parameters converge to the boundaries (e.g. parameterize the negative binomial as \(V = \mu(1+\psi \mu)\) and use the
lower
argument tonlminb
to enforce the constraint \(\psi \ge 0\))