require(Sleuth3)
require(ggplot2)
elephants <- case2201
We can run Quasi-Poisson regression by using family=quasipoisson
. This is the same as Poisson regression, but we also estimate the overdispersion
fit <- glm(Matings ~ Age, family= "poisson", data= elephants)
summary(fit)
##
## Call:
## glm(formula = Matings ~ Age, family = "poisson", data = elephants)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.80798 -0.86137 -0.08629 0.60087 2.17777
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.58201 0.54462 -2.905 0.00368 **
## Age 0.06869 0.01375 4.997 5.81e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 75.372 on 40 degrees of freedom
## Residual deviance: 51.012 on 39 degrees of freedom
## AIC: 156.46
##
## Number of Fisher Scoring iterations: 5
fit <- glm(Matings ~ Age, family= "quasipoisson", data= elephants)
summary(fit)
##
## Call:
## glm(formula = Matings ~ Age, family = "quasipoisson", data = elephants)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.80798 -0.86137 -0.08629 0.60087 2.17777
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.58201 0.58590 -2.700 0.0102 *
## Age 0.06869 0.01479 4.645 3.81e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 1.157334)
##
## Null deviance: 75.372 on 40 degrees of freedom
## Residual deviance: 51.012 on 39 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
As you can see, the standard errors are inflated by \(\sqrt{1.15}=1.07\).
One possibility is that the distribution simply isn’t Poisson. Let’s generate a distribution with a lot more zeros than you’d see in a Poisson distribution.
ind <- rbinom(100, size=1, prob=.5)
y <- ind*rpois(100, lambda=4)
qplot(y)
summary(glm(y~1, family="quasipoisson"))
##
## Call:
## glm(formula = y ~ 1, family = "quasipoisson")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1071 -2.1071 -0.9192 1.0725 2.9918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.7975 0.1106 7.209 1.14e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.717245)
##
## Null deviance: 312.41 on 99 degrees of freedom
## Residual deviance: 312.41 on 99 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
(Note: in this case, it’s more appropriate to use what’s called Zero-Inflated Poisson, but we won’t cover it here)
Another possibility is that we have unobserved covariates, so that what we see is actually a mixture of two Poisson distributions
ind <- rbinom(100, size=1, prob=.5)
y <- ind*rpois(100, lambda=4)+(1-ind)*rpois(100, lambda=6)
qplot(y)
summary(glm(y~1, family="quasipoisson"))
##
## Call:
## glm(formula = y ~ 1, family = "quasipoisson")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0984 -0.8832 -0.1427 0.9392 2.0687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.56862 0.04549 34.48 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 0.9932799)
##
## Null deviance: 105.37 on 99 degrees of freedom
## Residual deviance: 105.37 on 99 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 4
krunnit <- case2101
In the Krunnit data, we have the total number of species found in 1958, and the total number of species found in 1968. When using glm
with the Binomial family, we want to give the number of successes and failures, so we use
fit <- glm(cbind(Extinct, AtRisk-Extinct)~log(Area), family=binomial(), data=krunnit)
Aside: Bernoulli(pi) means Prob(success) = pi Prob(failure) = 1-pi
We can switch the variables around and the following instead.
fit1 <- glm(cbind(AtRisk-Extinct, Extinct)~log(Area), family=binomial(), data=krunnit)
This would cause the interpretation of coefficients to change.
Previously, we interpreted AtRisk as the number of species that survived, in which case
fit <- glm(cbind(Extinct, AtRisk)~log(Area), family=binomial(), data=krunnit)
would be appropriate.