First, let’s figure out what the peculiarities of the Fish dataset might be. Here is the boxplot again:

library(ggplot2)
library(lattice)
library(Sleuth3)
fish <- case0602
bwplot(Percentage ~ Pair, data = fish)

The thing that jumps out at us is that the variances don’t really look equal. Let’s check if there is evidence for them not being equal. (Note that you weren’t required to do that part.)

Here are the actual standard deviations for the different pairs:

library(plyr)
sd.actual <- ddply(fish, c("Pair"), summarise, sd=sd(Percentage))
ggplot(sd.actual, aes(x=Pair, y=sd)) + geom_bar(stat="identity")

Let’s compute the estimated SD:

sqrt(anova(lm(Percentage~Pair, data=fish))$`Mean Sq`[2])
## [1] 15.45742

Now, let’s see how often we’d expect to see SDs below 10 or above 20 if the variances for the different groups actully were equal

get_pair_sd <- function(){
  return(sd(rnorm(n=14, mean = 60, sd=15.46)))
}

n.sims <- 10000
sim.sd <-replicate(n.sims, get_pair_sd(), sd) 
qplot(sim.sd, bins=30)

(sum(sim.sd>20)+sum(sim.sd<10))/n.sims
## [1] 0.098

In about 10% of the cases, we’d expect that the group SD is smaller than 10 or greater than 20. In the actual data (i.e., in sd.actual), we see that in 50% of the cases, the SD is either smaller than 10 or greater than 20. This is something that we might see in \(1-(.9^6+5\times.9^5\times .1) = 18\%\) of cases, so this is not that suprising. Still, we can try proceeding by assuming that the variances actually are different.

Let’s generate a lot of fake datasets with the variances as actually observed in the fish datasets, but with equal means, and see how often the F-test results in a false positive (i.e., a conclusion that the means are different, even though they aren’t)

sim_sig <- function(sd.actual){
  n_pairs <- 6
  n_obs <- 14
  mean <- 60
  Pair <- paste('Pair', rep(1:n_pairs, each = n_obs), sep = '')
  Percentage <- c(rnorm(n_obs, mean=mean, sd = sd.actual[1, "sd"]),  
                  rnorm(n_obs, mean=mean, sd = sd.actual[2, "sd"]),
                  rnorm(n_obs, mean=mean, sd = sd.actual[3, "sd"]),
                  rnorm(n_obs, mean=mean, sd = sd.actual[4, "sd"]),
                  rnorm(n_obs, mean=mean, sd = sd.actual[5, "sd"]),
                  rnorm(n_obs, mean=mean, sd = sd.actual[6, "sd"]))           
  dat <- data.frame(Pair = Pair,
                    Percentage = Percentage)
  
  sig <- anova(lm(Percentage~Pair, data=dat))$`Pr(>F)`[1]<.05
  return(sig)
}

n_sims <- 50000
set.seed(0)
res <-replicate(n_sims, sim_sig(sd.actual))
mean(res)
## [1] 0.06648

The violation of the ANOVA assumptions seems to have led to about a 7% false discovery rate – more than then advertised 5%. The F-test is less conservative than we would expect.