Likelihood

Definition

probability of data given a model (structure & parameters)
in R: distributions via d* functions (base, Distributions Task View)
usually: complex model for the location, simpler models for the scale and shape
- e.g. Gamma with fixed shape, varying mean

consistent = converge to the true values as the number of independent observations grows to infinity
asymptotic Normality is the basis for the approximate (Wald) standard errors from summary()

important but a bit delicate.
as number of independent observations $n$ increases, the standard errors on each parameter decrease in proportion to $C/\sqrt{n}$ for some constant $C$
Asymptotically efficient means that there is no unbiased way of estimating parameters for which the standard errors shrink at a strictly faster rate (e.g., a smaller value of $C$, or a higher power of $n$ in the denominator).

MLEs make sense
lots of justifying theory
when it can do the job, it’s rarely the best tool for the job but it’s rarely much worse than the best (at least for large samples)
most statistical models (least-squares, GLMs) are special cases of MLE

Wald approximation: quadratic approximation (parabolas/ellipses)
- p-values: $Z$-scores ($\hat \beta/\sigma_{\hat \beta}$ N(0,1)$)
- confidence intervals: based on $N(\hat \beta, \sigma_{\hat \beta})$
- strongly asymptotic
likelihood:
- p-values: likelihood ratio test ($-2 \Delta \log L \sim \chi^2_n$)
- CIs: likelihood profiles
bootstrapping
- nonparametric (slow, requires independence)
- parametric (slow, model-dependent)
Bayesian
- requires priors
- strongly model-dependent
- often slow
- … but solves many problems

Beyond Normal errors, finite-size corrections are tricky