--- title: "Overdispersed Poisson and Binomial GLM review" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ## Overdispersed Poisson Regression (Qusi-Poisson Regression) ```{r message=F} require(Sleuth3) require(ggplot2) elephants <- case2201 ``` We can run Quasi-Poisson regression by using `family=quasipoisson`. This is the same as Poisson regression, but we also estimate the overdispersion ```{r} fit <- glm(Matings ~ Age, family= "poisson", data= elephants) summary(fit) ``` ```{r} fit <- glm(Matings ~ Age, family= "quasipoisson", data= elephants) summary(fit) ``` As you can see, the standard errors are inflated by $\sqrt{1.15}=1.07$. ## Causes of Overdispersion One possibility is that the distribution simply isn't Poisson. Let's generate a distribution with a lot more zeros than you'd see in a Poisson distribution. ```{r message=F} y <- rbinom(100, size=1, prob=.5)*rpois(100, lambda=4) qplot(y) summary(glm(y~1, family="quasipoisson")) ``` (Note: in this cause, it's more appropriate to use what's called Zero-Inflated Poisson, but we won't cover it here) Another possibility is that we have unobserved covariates, so that what we see is actually a mixture of two Poisson distributions ```{r message=F} ind <- rbinom(100, size=1, prob=.5) y <- ind*rpois(100, lambda=4)+(1-ind)*rpois(100, lambda=6) qplot(y) summary(glm(y~1, family="quasipoisson")) ``` ## Binomial family regression ```{r} krunnit <- case2101 ``` In the Krunnit data, we have the total number of species found in 1958, and the total number of species found in 1968. When using `glm` with the Binomial family, we want to give the number of successes and failures, so we use ```{r} fit <- glm(cbind(Extinct, AtRisk-Extinct)~log(Area), family=binomial(), data=krunnit) ``` Previously, we interpreted AtRisk as the number of species that *survived*, in which case ```{r} fit <- glm(cbind(Extinct, AtRisk)~log(Area), family=binomial(), data=krunnit) ``` would be appropriate.