Includes material from Ian Dworkin and Jonathan Dushoff, but they bear no responsibility for the contents.
MCMCglmm
: old, stablerstanarm
: newer, pre-compiled, better (but harder to
understand) priorsbrms
: newer, requires compilation, very flexiblerjags
/JAGS: older, roll-your-ownFrom ?lme4::sleepstudy
, “The average reaction time per
day (in milliseconds) for subjects in a sleep deprivation study.” A
standard example for mixed models.
library(lme4)
library(brms)
options(brms.backend = "cmdstanr")
library(broom.mixed)
library(purrr)
library(dplyr)
library(tidybayes)
library(bayesplot)
library(bayestestR)
library(ggplot2); theme_set(theme_bw())
library(see)
options(ggplot2.discrete.colour= scale_color_okabeito)
library(ggrastr)
library(cowplot)
We’ll fit the model
Reaction ~ 1 + Days + (1 + Days|Subject)
(a linear
regression + random-slopes model)
\[ \newcommand{\XX}{\mathbf X} \newcommand{\ZZ}{\mathbf Z} \newcommand{\bbeta}{\boldsymbol \beta} \newcommand{\bb}{\mathbf b} \newcommand{\zero}{\boldsymbol 0} \begin{split} \textrm{Reaction} & \sim \textrm{Normal}(\XX \bbeta + \ZZ \bb, \sigma^2) \\ \bb & \sim \textrm{MVNorm}(\zero, \Sigma) \\ \Sigma & = \textrm{block-diag}, \left( \begin{array}{cc} \sigma^2_i & \sigma_{is} \\ \sigma_{is} & \sigma^2_{s} \end{array} \right) \end{split} \]
form1 <- Reaction ~ 1 + Days + (1 + Days|Subject)
brms
has a convenience function get_prior()
that displays the parameters/priors that need to be set, along with
their default values.
get_prior(form1, sleepstudy)
## prior class coef group resp dpar nlpar lb ub source
## (flat) b default
## (flat) b Days (vectorized)
## lkj(1) cor default
## lkj(1) cor Subject (vectorized)
## student_t(3, 288.7, 59.3) Intercept default
## student_t(3, 0, 59.3) sd 0 default
## student_t(3, 0, 59.3) sd Subject 0 (vectorized)
## student_t(3, 0, 59.3) sd Days Subject 0 (vectorized)
## student_t(3, 0, 59.3) sd Intercept Subject 0 (vectorized)
## student_t(3, 0, 59.3) sigma 0 default
Info
on brms default priors: Intercept is Student \(t\) with 3 df, mean equal to observed
median of the response variable, SD equal to (rescaled) mean
absolute deviation. RE SDs are the same but half-\(t\) (see lb = 0
column), mode
at 0.
Constraining intercept and slope to ‘reasonable’ values:
b_prior <- c(set_prior("normal(200, 50)", "Intercept"),
set_prior("normal(0, 10)", "b")
)
Helper function for prior predictive simulations: run a short MCMC chain, plot results. (We’re going to ignore/suppress warnings for this stage …)
test_prior <- function(p) {
## https://discourse.mc-stan.org/t/suppress-all-output-from-brms-in-markdown-files/30117/3
capture.output(
b <- brm(form1, sleepstudy, prior = p,
seed = 101, ## reproducibility
sample_prior = 'only', ## for prior predictive sim
chains = 1, iter = 500, ## very short sample for convenience
silent = 2, refresh = 0 ## be vewy vewy quiet ...
)
)
p_df <- sleepstudy |> add_predicted_draws(b)
## 'spaghetti plot' of prior preds
gg0 <- ggplot(p_df,aes(Days, .prediction, group=interaction(Subject,.draw))) +
geom_line(alpha = 0.1)
print(gg0)
invisible(b) ## return without printing
}
test_prior(b_prior)
Decrease random-effects SDs:
b_prior2 <- c(set_prior("normal(200, 10)", "Intercept"),
set_prior("normal(0, 5)", "b"),
set_prior("student_t(3, 0, 0.1)", "sd")
)
test_prior(b_prior2)
Make all scales even smaller?
b_prior3 <- c(set_prior("normal(200, 5)", "Intercept"),
set_prior("normal(0, 2)", "b"),
set_prior("student_t(3, 0, 0.01)", "sd")
)
test_prior(b_prior3)
We forgot to constrain the prior for the residual standard deviation!
b_prior4 <- c(set_prior("normal(200, 5)", "Intercept"),
set_prior("normal(0, 2)", "b"),
set_prior("normal(0, 1)", "sd"),
set_prior("normal(0, 1)", "sigma")
)
test_prior(b_prior4)
Now relax a bit …
b_prior5 <- c(set_prior("normal(200, 10)", "Intercept"),
set_prior("normal(0, 8)", "b"),
set_prior("student_t(10, 0, 3)", "sd"),
set_prior("student_t(10, 0, 3)", "sigma")
)
test_prior(b_prior5)
In hindsight, should relax more. Set intercept back to default value, widen fixed-effect (slope) prior
b_prior6 <- c(set_prior("normal(0, 20)", "b"),
set_prior("student_t(10, 0, 3)", "sd"),
set_prior("student_t(10, 0, 3)", "sigma")
)
test_prior(b_prior6)
There ae still a few negative reaction times left, which is obviously unrealistic, but at least they’re rare …
adapt_delta
below is one knob to turn to make the MCMC
‘work harder’ to avoid geometry problemsbrms
does a
lot of fancy stuff under the hood to try to make things work wellm_lmer <- lmer(form1, sleepstudy)
b_reg <- brm(form1, sleepstudy, prior = b_prior5,
seed = 101,
control = list(adapt_delta = 0.95)
)
b_reg2 <- brm(form1, sleepstudy, prior = b_prior6,
seed = 101,
control = list(adapt_delta = 0.95)
)
## also try with default settings
b_default <- brm(form1, sleepstudy,
seed = 101
)
(These are actually run as a batch from run_examples.R
and loaded here …)
load("examples1.rda")
Should really diagnose before looking at the parameter values! MCSE is the Monte Carlo Standard Error, equal to (std err)/sqrt(ESS).
print(bayestestR::diagnostic_posterior(b_reg),
digits = 4)
## Parameter Rhat ESS MCSE
## 1 b_Days 1.001 1185 0.08135
## 2 b_Intercept 1.002 1152 0.28076
\(\hat R\) (Rhat
), ESS
look OK. Vehtari et al. (2021) recommend
an \(\hat R\) threshold of 1.01, ESS
> 400, MCSE (Monte Carlo standard error) ‘small enough’ for
scientific purposes (“figure out what is the needed accuracy for the
quantity of interest (for reporting usually 2 significant digits is
enough”)
Trace plots should look like white noise (see tweet in previous set
of notes); no obvious trends, no ‘slow’ variation.
(regex_pars
specifies a regular expression
for deciding which parameters to show.)
mcmc_trace(b_reg, regex_pars= "b_|sd_")
Skewed posteriors can make trace plots look ‘spiky’: to avoid this could look at transformed (e.g. log) posterior distributions. Vehtari et al. (2021) recommend rank-histograms instead: chains should have similar rank distributions
mcmc_rank_overlay(b_reg, regex_pars= "b_|sd_")
Everything looks OK, on to interpretation etc.
summary(b_reg)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy (Number of observations: 180)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~Subject (Number of levels: 18)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 20.69 7.76 6.17 37.31 1.00 865 577
## sd(Days) 10.89 2.68 6.51 16.76 1.00 1212 2037
## cor(Intercept,Days) 0.50 0.25 -0.04 0.91 1.00 703 786
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 233.85 9.53 213.16 250.81 1.00 1064 934
## Days -0.42 2.80 -5.97 4.76 1.00 1224 1832
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 25.69 1.63 22.87 29.21 1.00 2231 2618
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Plot results. broom.mixed::tidy()
will get what you need
- complicated code below is to get everything lined up nicely.
brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
res_bayes <- (brms_modlist
|> purrr::map_dfr(~ tidy(., conf.int = TRUE), .id = "model")
)
## need to do separately - different conf.method choices
res_lmer <- suppressMessages(m_lmer
|> tidy(conf.int = TRUE, conf.method = "profile")
|> mutate(model = "lmer", .before = 1)
)
res <- (bind_rows(res_lmer, res_bayes)
|> select(-c(std.error, statistic, component, group))
|> filter(term != "(Intercept)")
|> mutate(facet = ifelse(grepl("^cor", term), "cor",
ifelse(grepl("Days", term), "Days",
"int")))
|> mutate(across(model, ~ factor(., levels = c("lmer", names(brms_modlist)))))
)
## getting a mysterious `scale_name` deprecation warning ... ?
ggplot(res, aes(estimate, term, colour = model, shape = model)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high),
position = position_dodge(width = 0.5)) +
facet_wrap(~ facet, scales = "free", ncol = 1) +
guides(colour = guide_legend(reverse=TRUE),
shape = guide_legend(reverse=TRUE))
post_df1 <- sleepstudy |> add_predicted_draws(b_reg)
gg1 <- ggplot(post_df1,
aes(Days, .prediction, group=interaction(Subject,.draw))) +
rasterise(geom_line(alpha = 0.1)) +
geom_point(aes(y=Reaction), col = "red") +
labs(y = "Reaction time")
print(gg1 + labs(title = "tighter priors (weird)"))
post_df2 <- sleepstudy |> add_predicted_draws(b_reg2)
print(gg1 %+% post_df2 + labs(title = "looser priors"))
Compare just Subject 1 results:
post_df1B <- filter(post_df1, Subject == levels(Subject)[1])
post_df2B <- filter(post_df2, Subject == levels(Subject)[1])
plot_grid(gg1 %+% post_df1B + labs(title = "tighter"),
gg1 %+% post_df2B + labs(title = "looser"))
Bottom line: b_prior5
/b_reg
give “wrong”
answers (priors are too tight),
b_prior6
/b_reg2
are OK, but the
predictions are nearly identical. This turns out to be because
the random effects are taking up the slack in the
Plot random effect draws, from here:
Random effects (deviations from population-level/fixed-effect predictions):
brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
ranefs <- (brms_modlist
|> purrr::map_dfr(~ tidy(., effects = "ran_vals", conf.int = TRUE), .id = "model")
)
gg_r <- ggplot(ranefs, aes(estimate, level, colour = model)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high), position = position_dodge(width = 0.5)) +
facet_wrap(~term, scale = "free", nrow = 1)
print(gg_r + geom_vline(lty = 2, xintercept = 0))
Group-level coefficients (predicted intercept/slope for each subject):
## this is way too ugly - needs to be incorporated into broom.mixed as an option
my_coefs <- function(x) {
meltfun <- function(a) {
dd <- as.data.frame(ftable(a)) |>
setNames(c("level", "var", "term", "value")) |>
tidyr::pivot_wider(names_from = var, values_from = value) |>
rename(estimate = "Estimate",
std.error = "Est.Error",
## FIXME: not robust to changing levels
conf.low = "Q2.5",
conf.high = "Q97.5")
}
purrr:::map_dfr(coef(x), meltfun, .id = "group")
}
coefs <- (brms_modlist
|> purrr::map_dfr(my_coefs, .id = "model")
)
print(gg_r %+% coefs)
Compare means of random effects (should be zero!)
ranefs |>
group_by(model, term) |>
summarise(mean = mean(estimate),
se = sd(estimate)/n(),
.groups = "drop")
## # A tibble: 6 × 4
## model term mean se
## <chr> <chr> <dbl> <dbl>
## 1 brms_default (Intercept) 0.106 1.18
## 2 brms_default Days 0.0887 0.302
## 3 brms_reg (Intercept) 16.4 0.868
## 4 brms_reg Days 10.9 0.328
## 5 brms_reg2 (Intercept) -0.0339 0.731
## 6 brms_reg2 Days 0.0489 0.309
## to plot:
## |>
## ggplot(aes(mean, term, colour = model)) +
## geom_pointrange(aes(xmin=mean-se, xmax = mean + se), position = position_dodge(width = 0.5))
## these are some methods for testing sensitivity to the prior
## that I do **not** recommend ...
check_prior(b_reg)
try(check_prior(b_reg2)) ## ugh
debug(bayestestR:::.check_prior)
try(check_prior(b_reg2, method = "lakeland")) ## ugh
Last updated: 13 March 2024 20:41