knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(cache=TRUE)

require(ggplot2)
require(lme4)

Problem 5a

Let’s read in the data

lines <- 
  "Game   Scored  N.Attempts
1   4   5
2   5   11
3   5   14
4   5   12
5   2   7
6   7   10
7   6   14
8   9   15
9   4   12
10  1   4
11  13  27
12  5   17
13  6   12
14  9   9
15  7   12
16  3   10
17  8   12
18  1   6
19  18  39
20  3   13
21  10  17
22  1   6
23  3   12"
con <- textConnection(lines)
shaq <- read.csv(con, sep="")

Let’s now fit two models: one model where there is just one intercept, and one model

shaq$Game <- as.factor(shaq$Game)

fit <- glm(cbind(Scored, N.Attempts-Scored)~1, family=binomial(), data=shaq)
fit2 <- glm(cbind(Scored, N.Attempts-Scored)~Game, family=binomial(), data=shaq)

Now, run a Likelihood-Ratio Test:

anova(fit2, fit, test="LRT")
## Analysis of Deviance Table
## 
## Model 1: cbind(Scored, N.Attempts - Scored) ~ Game
## Model 2: cbind(Scored, N.Attempts - Scored) ~ 1
##   Resid. Df Resid. Dev  Df Deviance Pr(>Chi)  
## 1         0      0.000                        
## 2        22     40.021 -22  -40.021  0.01075 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Null Hypothesis: The probability of Shaq successfully scoring from a free throw does not differ across game days; i.e., the intercept-only model is equivalent to the model adjusting for game day.

Based on the p-value being <0.05 (p=0.01), we reject the null hypothesis. There is evidence to suggest that the probability of Shaq successfully scoring from a free throw differs significantly across game days; i.e., he has both good and bad days.

PROBLEM 5b

First, let’s compute the no-pooling estimates. (Note: there are other ways of doing this, and the error bars would look different in that case.)

fit.nopool <- glm(cbind(Scored, N.Attempts-Scored)~Game, family=binomial(link="logit"), data=shaq)
predict.nopool <- predict(fit.nopool, type="response", se=T)

probs.nopool <- data.frame( game = seq(1,23),
                            probs = predict.nopool$fit,
                            LB = pmax(0, predict.nopool$fit-predict.nopool$se.fit),
                            UB = pmin(1, predict.nopool$fit+predict.nopool$se.fit))

ggplot(probs.nopool, aes(x = game, y = probs)) +
  geom_point() +
  geom_errorbar(aes(ymin=LB, ymax=UB)) 

Now, let’s compute the complete pooling estimate. Complete pooling means we are estimating just one number – the probability of scoring in any game. Again, other options exist here.

total.scored <- sum(shaq$Scored)
total.attempted <- sum(shaq$N.Attempts)
prob <- total.scored/total.attempted
prob.se <- prob*(1-prob)/sqrt(dim(shaq)[1])

Let’s display this in the same way as before.

probs.completepool <- data.frame( game = seq(1,23),
                                  probs = rep(prob, 23),
                                  LB = prob-prob.se,
                                  UB = prob+prob.se)


ggplot(probs.completepool, aes(x = game, y = probs)) +
  geom_point() +
  geom_errorbar(aes(ymin=LB, ymax=UB)) +
  ylim(0, 1)

Partial pooling

Finally, let’s do partial pooling. Note that glmer doesn’t provide standard errors for the coefficients.

fit.partial <- glmer(cbind(Scored, N.Attempts-Scored)~(1|Game), family=binomial(), data=shaq)
partial.probs <- plogis(ranef(fit.partial)$Game$`(Intercept)`)

probs.partial <- data.frame( game = seq(1,23),
                            probs = partial.probs)
                            

ggplot(probs.partial, aes(x = game, y = probs)) +
  geom_point() +
  ylim(0, 1)

To make things clearer, let’s plot the error bars for the no-pooling estimates together with the partial pooling estimates.

ggplot(probs.nopool, aes(x = game, y = probs)) +
  geom_errorbar(aes(ymin=LB, ymax=UB)) +
  geom_point(data=probs.partial, aes(x = game, y = probs))

PROBLEM 5c

The no-pooling method independently provided estimates for Shaq’s free throw success rate for each day; we can see that one day (Game 14), Shaq scored every time, and the no-pooling estimate of the probability of scoring was 100%, which is implausible. Partial pooling moved the estimates closer to the mean probability (0.45) and were less variable as compared to the non-pooled estimates. For example, we can easily see that the estimate corresponding to Shaq’s particually successful shooting day (Game 14) became closer to the other estimates; all estimates were substantially moved close to 46%. Complete pooling resulted in one estimated probability, 46%.

PROBLEM 6a

I (Michael) used my ROSI transcript from undergrad to come up with an example. The measurements are my grades in STA, MAT, and CSC classes. The model is

\[grade_i\sim N(\alpha_{j[i]}, \sigma_{grade})\] \[\alpha_j\sim N(\mu_\alpha, \sigma_\alpha)\]

Here, \(grade_i\) is the grade I received in course i. \(\alpha_1\) is the (pooled estimate of the) mean for MAT, \(\alpha_2\) for STA, and \(\alpha_3\) for CSC.

Since I have all the data, I can simply compute the \(\sigma_\alpha\) (5.0, mostly due to the Math Specialist courses…), \(\sigma_{grade}\) (8.1), and \(\mu_\alpha\) (left to your imagination :-) ).