Includes material from Ian Dworkin and Jonathan Dushoff, but they bear no responsibility for the contents.
rstanarm
, MCMCglmm
,
brms
, INLA
BUGS
et
al. (JAGS
), Stan
/rethinking
,
TMB
, greta
, …coda
,
bayestestR
)Frequentist | Bayesian | |
---|---|---|
Discrete hypothesis testing | null-hypothesis significance testing; AIC etc (every stats textbook; Burnham and Anderson (2002)) | Bayes factors; Bayesian indices of significance (Makowski et al. 2019; Shi and Yin 2021) |
Continuous/quantitative (estimation with uncertainty) | MLE etc. + confidence intervals (Bolker 2008) | posterior means/medians and credible intervals |
\[ \newcommand{\pr}{\textrm{Pr}} \pr(A_i|B) = \frac{\pr(B|A_i) \pr(A_i)}{\sum \pr(B|A_j) \pr(A_j)} \]
\(\pr(A_i)\) the prior probability of \(A_i\)
\(\pr(A_i|B)\) is the posterior probability of \(A_i\), given event \(B\)
People argue about Bayesian inference, but nobody argues about Bayes theorem
Now let’s change \(A_i\) to \(H_i\) (“hypothesis”, which can denote a model or a particular parameter value) and \(B\) to \(D\) (“data”); we get
\[ \begin{split} \pr(H_i|D) & = \frac{\pr(D|H_i) \pr(H_i)}{\sum \pr(D|H_j) \pr(H_j)} \\ & = \frac{\pr(D|H_i) \pr(H_i)}{\pr(D)} \end{split} \]
If \(D\) is the data, then \(\pr(H_i)\) is the prior probability of hypothesis \(H_i\) and \(\pr(D|H_i)\) is the likelihood of hypothesis \(H_i\).
The denominator is the probability of observing the data under any of the hypotheses. It looks scary, and it is computationally scary (when the \(H_i\) represent a set of continuous parameter values, the sum becomes an integral; when a model has lots of continuous parameters, it becomes a high-dimensional integral). However, most tools for Bayesian inference represent elegant ways to avoid ever having to compute the denominator explicitly, so in practice you won’t have to worry about it. You may sometimes see Bayes’ Rule written out as \(\textrm{posterior} \propto \textrm{likelihood} \times \textrm{prior}\), where \(\propto\) means “proportional to”, to emphasize that we can often avoid thinking about the denominator.
Bolker 2008 Figure 4.2: Decomposition of the unconditional probability of the observed data (\(\pr(D)\)) into the sum of the probabilities of the intersection of the data with each possible hypothesis (\(\sum_{j=1}^N \pr(D | H_j) \pr(H_j)\)). The entire gray ellipse in the middle represents \(\pr(D)\). Each wedge (e.g. the hashed area \(H_5\)) represents an alternative hypothesis; the area corresponds to \(\pr(H_5)\). The ellipse is divided into “pizza slices” (e.g. \(D \cap H_5\) , hashed and colored area). The area of each slice corresponds to \(D \cap H_j\) (\(\pr(D \cap H_j) = \pr(D|H_j) \pr(H_j)\)) , the joint probability of the data \(D\) (ellipse) and the particular hypothesis \(H_j\) (wedge). The posterior probability \(\pr(H_j|D)\) is the fraction of the ellipse taken by \(H_j\), i.e. the area of the pizza slice divided by the area of the ellipse.
Bolker 2008 Figure 6.11: ``Bayesian 95% credible interval (gray), and 5% tail areas (hashed), for the tadpole predation data (weak prior: shape=(1,1)).”
brms
, rstanarm
,
MCMCglmm
, INLA
MCMCpack
rjags
/r2jags
), Stan (wrapped by rethinking), greta, TMB
(TMB
/tmbstan
), NIMBLELast updated: 14 March 2024 19:33