Includes material from Ian Dworkin and Jonathan Dushoff, but they bear no responsibility for the contents.

Intro

Simplified Bayes workflow

  1. Decide on model specification; journal it, along with justifications
  2. Decide on priors, including prior predictive simulations
  3. Data exploration (graphical)
  4. Fit model
  5. MCMC diagnostics
  6. Model diagnostics, including comparing posterior predictive sims to data, check sensitivity to priors (?)
  7. Interpret/predict/etc.

Packages

Packages

  • MCMCglmm: old, stable
  • rstanarm: newer, pre-compiled, better (but harder to understand) priors
  • brms: newer, requires compilation, very flexible
  • rjags/JAGS: older, roll-your-own

Example

‘sleepstudy’ data

From ?lme4::sleepstudy, “The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.” A standard example for mixed models.

library(lme4)
library(brms)
options(brms.backend = "cmdstanr")
library(broom.mixed)
library(purrr)
library(dplyr)
library(tidybayes)
library(bayesplot)
library(bayestestR)
library(ggplot2); theme_set(theme_bw())
library(see)
options(ggplot2.discrete.colour= scale_color_okabeito)
library(ggrastr)
library(cowplot)

We’ll fit the model Reaction ~ 1 + Days + (1 + Days|Subject) (a linear regression + random-slopes model)

\[ \newcommand{\XX}{\mathbf X} \newcommand{\ZZ}{\mathbf Z} \newcommand{\bbeta}{\boldsymbol \beta} \newcommand{\bb}{\mathbf b} \newcommand{\zero}{\boldsymbol 0} \begin{split} \textrm{Reaction} & \sim \textrm{Normal}(\XX \bbeta + \ZZ \bb, \sigma^2) \\ \bb & \sim \textrm{MVNorm}(\zero, \Sigma) \\ \Sigma & = \textrm{block-diag}, \left( \begin{array}{cc} \sigma^2_i & \sigma_{is} \\ \sigma_{is} & \sigma^2_{s} \end{array} \right) \end{split} \]

form1 <- Reaction ~ 1 + Days + (1 + Days|Subject)

prior predictive simulation

brms has a convenience function get_prior() that displays the parameters/priors that need to be set, along with their default values.

get_prior(form1, sleepstudy)
##                      prior     class      coef   group resp dpar nlpar lb ub       source
##                     (flat)         b                                              default
##                     (flat)         b      Days                               (vectorized)
##                     lkj(1)       cor                                              default
##                     lkj(1)       cor           Subject                       (vectorized)
##  student_t(3, 288.7, 59.3) Intercept                                              default
##      student_t(3, 0, 59.3)        sd                                    0         default
##      student_t(3, 0, 59.3)        sd           Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)        sd      Days Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)        sd Intercept Subject                  0    (vectorized)
##      student_t(3, 0, 59.3)     sigma                                    0         default

Info on brms default priors: Intercept is Student \(t\) with 3 df, mean equal to observed median of the response variable, SD equal to (rescaled) mean absolute deviation. RE SDs are the same but half-\(t\) (see lb = 0 column), mode at 0.

Constraining intercept and slope to ‘reasonable’ values:

b_prior <- c(set_prior("normal(200, 50)", "Intercept"),
             set_prior("normal(0, 10)", "b")
             )

Helper function for prior predictive simulations: run a short MCMC chain, plot results. (We’re going to ignore/suppress warnings for this stage …)

test_prior <- function(p) {
    ## https://discourse.mc-stan.org/t/suppress-all-output-from-brms-in-markdown-files/30117/3
    capture.output(
        b <- brm(form1, sleepstudy, prior = p,
                 seed = 101,              ## reproducibility
                 sample_prior = 'only',   ## for prior predictive sim
                 chains = 1, iter = 500,  ## very short sample for convenience
                 silent = 2, refresh = 0  ## be vewy vewy quiet ...
                 )
    )
    p_df <- sleepstudy |> add_predicted_draws(b)
    ## 'spaghetti plot' of prior preds
    gg0 <- ggplot(p_df,aes(Days, .prediction, group=interaction(Subject,.draw))) +
        geom_line(alpha = 0.1)
    print(gg0)
    invisible(b) ## return without printing
}
test_prior(b_prior)

Decrease random-effects SDs:

b_prior2 <- c(set_prior("normal(200, 10)", "Intercept"),
              set_prior("normal(0, 5)", "b"),
              set_prior("student_t(3, 0, 0.1)", "sd")
              )
test_prior(b_prior2)

Make all scales even smaller?

b_prior3 <- c(set_prior("normal(200, 5)", "Intercept"),
              set_prior("normal(0, 2)", "b"),
              set_prior("student_t(3, 0, 0.01)", "sd")
             )
test_prior(b_prior3)

We forgot to constrain the prior for the residual standard deviation!

b_prior4 <- c(set_prior("normal(200, 5)", "Intercept"),
              set_prior("normal(0, 2)", "b"),
              set_prior("normal(0, 1)", "sd"),
              set_prior("normal(0, 1)", "sigma")
             )
test_prior(b_prior4)

Now relax a bit …

b_prior5 <- c(set_prior("normal(200, 10)", "Intercept"),
              set_prior("normal(0, 8)", "b"),
              set_prior("student_t(10, 0, 3)", "sd"),
              set_prior("student_t(10, 0, 3)", "sigma")
             )
test_prior(b_prior5)

In hindsight, should relax more. Set intercept back to default value, widen fixed-effect (slope) prior

b_prior6 <- c(set_prior("normal(0, 20)", "b"),
              set_prior("student_t(10, 0, 3)", "sd"),
              set_prior("student_t(10, 0, 3)", "sigma")
             )
test_prior(b_prior6)

There ae still a few negative reaction times left, which is obviously unrealistic, but at least they’re rare …

fitting

  • Always set the seed!
  • Probably OK to use default chain lengths, burn-in, starting values (chosen randomly from the prior), etc., until you see whether things are OK.
  • adapt_delta below is one knob to turn to make the MCMC ‘work harder’ to avoid geometry problems
  • in general ‘well-behaved’ problems (all parameters on similar scales, parameters relatively independent, priors restrict from sampling in extreme/flat regions, etc.) sample better (Bolker et al. 2013); brms does a lot of fancy stuff under the hood to try to make things work well
m_lmer <- lmer(form1, sleepstudy)
b_reg <- brm(form1, sleepstudy, prior = b_prior5,
             seed = 101,
             control = list(adapt_delta = 0.95)
             )
b_reg2 <- brm(form1, sleepstudy, prior = b_prior6,
              seed = 101,   
              control = list(adapt_delta = 0.95)
              )
## also try with default settings
b_default <- brm(form1, sleepstudy,
                 seed = 101
                 )

(These are actually run as a batch from run_examples.R and loaded here …)

load("examples1.rda")

diagnose

Should really diagnose before looking at the parameter values! MCSE is the Monte Carlo Standard Error, equal to (std err)/sqrt(ESS).

print(bayestestR::diagnostic_posterior(b_reg),
      digits = 4)
##     Parameter  Rhat  ESS    MCSE
## 1      b_Days 1.001 1185 0.08135
## 2 b_Intercept 1.002 1152 0.28076

\(\hat R\) (Rhat), ESS look OK. Vehtari et al. (2021) recommend an \(\hat R\) threshold of 1.01, ESS > 400, MCSE (Monte Carlo standard error) ‘small enough’ for scientific purposes (“figure out what is the needed accuracy for the quantity of interest (for reporting usually 2 significant digits is enough”)

trace plots

Trace plots should look like white noise (see tweet in previous set of notes); no obvious trends, no ‘slow’ variation. (regex_pars specifies a regular expression for deciding which parameters to show.)

mcmc_trace(b_reg, regex_pars= "b_|sd_")

Skewed posteriors can make trace plots look ‘spiky’: to avoid this could look at transformed (e.g. log) posterior distributions. Vehtari et al. (2021) recommend rank-histograms instead: chains should have similar rank distributions

mcmc_rank_overlay(b_reg, regex_pars= "b_|sd_")

Everything looks OK, on to interpretation etc.

look at results

summary(b_reg)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: Reaction ~ Days + (Days | Subject) 
##    Data: sleepstudy (Number of observations: 180) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Group-Level Effects: 
## ~Subject (Number of levels: 18) 
##                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)          20.69      7.76     6.17    37.31 1.00      865      577
## sd(Days)               10.89      2.68     6.51    16.76 1.00     1212     2037
## cor(Intercept,Days)     0.50      0.25    -0.04     0.91 1.00      703      786
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept   233.85      9.53   213.16   250.81 1.00     1064      934
## Days         -0.42      2.80    -5.97     4.76 1.00     1224     1832
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma    25.69      1.63    22.87    29.21 1.00     2231     2618
## 
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Plot results. broom.mixed::tidy() will get what you need - complicated code below is to get everything lined up nicely.

brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
res_bayes <- (brms_modlist
    |> purrr::map_dfr(~ tidy(., conf.int = TRUE), .id = "model")
)
## need to do separately - different conf.method choices
res_lmer <- suppressMessages(m_lmer
    |> tidy(conf.int = TRUE, conf.method = "profile")
    |> mutate(model = "lmer", .before = 1)
)
res <- (bind_rows(res_lmer, res_bayes)
    |> select(-c(std.error, statistic, component, group))
    |> filter(term != "(Intercept)")
    |> mutate(facet = ifelse(grepl("^cor", term), "cor",
                      ifelse(grepl("Days", term), "Days",
                             "int")))
    |> mutate(across(model, ~ factor(., levels = c("lmer", names(brms_modlist)))))
)
## getting a mysterious `scale_name` deprecation warning ... ?
ggplot(res, aes(estimate, term, colour = model, shape = model)) +
    geom_pointrange(aes(xmin = conf.low, xmax = conf.high),
                    position = position_dodge(width = 0.5)) +
    facet_wrap(~ facet, scales = "free", ncol = 1) +
    guides(colour = guide_legend(reverse=TRUE),
           shape = guide_legend(reverse=TRUE))

posterior predictive simulations, compare with data

post_df1 <- sleepstudy |> add_predicted_draws(b_reg)
gg1 <- ggplot(post_df1,
              aes(Days, .prediction, group=interaction(Subject,.draw))) +
    rasterise(geom_line(alpha = 0.1)) +
    geom_point(aes(y=Reaction), col = "red") +
    labs(y = "Reaction time")
print(gg1 + labs(title = "tighter priors (weird)"))

post_df2 <- sleepstudy |> add_predicted_draws(b_reg2)
print(gg1 %+% post_df2 + labs(title = "looser priors"))

Compare just Subject 1 results:

post_df1B <- filter(post_df1, Subject == levels(Subject)[1])
post_df2B <- filter(post_df2, Subject == levels(Subject)[1])
plot_grid(gg1 %+% post_df1B + labs(title = "tighter"),
          gg1 %+% post_df2B + labs(title = "looser"))

Bottom line: b_prior5/b_reg give “wrong” answers (priors are too tight), b_prior6/b_reg2 are OK, but the predictions are nearly identical. This turns out to be because the random effects are taking up the slack in the

Plot random effect draws, from here:

Random effects (deviations from population-level/fixed-effect predictions):

brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
ranefs <- (brms_modlist
    |> purrr::map_dfr(~ tidy(., effects = "ran_vals", conf.int = TRUE), .id = "model")
)
gg_r <- ggplot(ranefs, aes(estimate, level, colour = model)) +
    geom_pointrange(aes(xmin = conf.low, xmax = conf.high), position = position_dodge(width = 0.5)) +
    facet_wrap(~term, scale = "free", nrow = 1)
print(gg_r +  geom_vline(lty = 2, xintercept = 0))

Group-level coefficients (predicted intercept/slope for each subject):

## this is way too ugly - needs to be incorporated into broom.mixed as an option
my_coefs <- function(x) {
    meltfun <- function(a) {
        dd <- as.data.frame(ftable(a)) |>  
            setNames(c("level", "var", "term", "value")) |>
            tidyr::pivot_wider(names_from = var, values_from = value) |>
            rename(estimate = "Estimate",
                   std.error = "Est.Error",
                   ## FIXME: not robust to changing levels
                   conf.low = "Q2.5",
                   conf.high = "Q97.5")
    }
    purrr:::map_dfr(coef(x), meltfun, .id = "group")
}
coefs <- (brms_modlist
    |> purrr::map_dfr(my_coefs, .id = "model")
)
print(gg_r %+% coefs)

Compare means of random effects (should be zero!)

ranefs |>
    group_by(model, term) |>
    summarise(mean = mean(estimate),
              se = sd(estimate)/n(),
              .groups = "drop")
## # A tibble: 6 × 4
##   model        term           mean    se
##   <chr>        <chr>         <dbl> <dbl>
## 1 brms_default (Intercept)  0.106  1.18 
## 2 brms_default Days         0.0887 0.302
## 3 brms_reg     (Intercept) 16.4    0.868
## 4 brms_reg     Days        10.9    0.328
## 5 brms_reg2    (Intercept) -0.0339 0.731
## 6 brms_reg2    Days         0.0489 0.309
## to plot:
## |>
##     ggplot(aes(mean, term, colour = model)) +
##    geom_pointrange(aes(xmin=mean-se, xmax = mean + se), position = position_dodge(width = 0.5))
## these are some methods for testing sensitivity to the prior
## that I do **not** recommend ...
check_prior(b_reg)
try(check_prior(b_reg2)) ## ugh
debug(bayestestR:::.check_prior)
try(check_prior(b_reg2, method = "lakeland")) ## ugh

References

Bolker, Benjamin M., Beth Gardner, Mark Maunder, Casper W. Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for Fitting Nonlinear Ecological Models in R, AD Model Builder, and BUGS.” Edited by Satu Ramula. Methods in Ecology and Evolution 4 (6): 501–12. https://doi.org/10.1111/2041-210X.12044.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved R-hat for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2): 667–718. https://doi.org/10.1214/20-BA1221.

Last updated: 13 March 2024 20:41