Licensed under the Creative Commons attribution-noncommercial license. Please share & remix noncommercially, mentioning its origin.
library(ggplot2); theme_set(theme_bw())
library(ggExtra)
library(cowplot)
library(dotwhisker)
assumptions of linear models do break down sometimes
count data: discrete, non-negative
proportion data: discrete counts, \(0 \le x \le N\)
hard to transform to Normal
linear model doesn’t make sense
"gaussian"
), Gamma)glm
is Gaussianlink="log"
rather than inverse (default)qlogis()
function (plogis()
is logistic/inverse-link)cbind(k,N-k)
not cbind(k,N)
]weights=N
glm(p~...,data,weights=rep(N,nrow(data)))
plot
is still somewhat usefularm::binnedplot
)cor(observed,predict(model, type="response"))
)type="response"
to get data-scale predictions)DHARMa
packageaods3::gof()
like LMs, but:
confint.glm()
computes likelihood profile CIsglm()
problemsfamily
(default Gaussian: \(\to\) linear model); using glm()
for linear models (unnecessary)family=binomial
aids <- read.csv("../../data/aids.csv")
## set up time variable
aids <- transform(aids, date=year+(quarter-1)/4)
print(gg0 <- ggplot(aids,aes(date,cases))+geom_point())
print(gg1 <- gg0 + geom_smooth(method="glm",colour="red",
method.args=list(family="quasipoisson")))
## `geom_smooth()` using formula = 'y ~ x'
g1 <- glm(cases~date,aids,family=quasipoisson(link="log"))
summary(g1)
##
## Call:
## glm(formula = cases ~ date, family = quasipoisson(link = "log"),
## data = aids)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.023e+03 6.806e+01 -15.03 1.25e-11 ***
## date 5.168e-01 3.425e-02 15.09 1.16e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.354647)
##
## Null deviance: 677.26 on 19 degrees of freedom
## Residual deviance: 53.02 on 18 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
Note NA
value for AIC …
plot(g1)
)acf(residuals(g1)) ## check autocorrelation
print(gg2 <- gg1+geom_smooth(method="glm",formula=y~poly(x,2),
method.args=list(family="quasipoisson")))
## `geom_smooth()` using formula = 'y ~ x'
print(gg2+scale_y_log10())
## `geom_smooth()` using formula = 'y ~ x'
g2 <- update(g1,.~poly(date,2))
summary(g2)
##
## Call:
## glm(formula = cases ~ poly(date, 2), family = quasipoisson(link = "log"),
## data = aids)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.86859 0.05004 77.311 < 2e-16 ***
## poly(date, 2)1 3.82934 0.25162 15.219 2.46e-11 ***
## poly(date, 2)2 -0.68335 0.19716 -3.466 0.00295 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.657309)
##
## Null deviance: 677.264 on 19 degrees of freedom
## Residual deviance: 31.992 on 17 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
anova(g1,g2,test="F") ## for quasi-models specifically
## Analysis of Deviance Table
##
## Model 1: cases ~ date
## Model 2: cases ~ poly(date, 2)
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 18 53.020
## 2 17 31.992 1 21.028 12.688 0.002399 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
acf(residuals(g2)) ## check autocorrelation
Also see nlme::ACF
for ACFs of grouped data …
What did you learn? Where did you get stuck?