Anova

knitr::opts_chunk$set(echo = TRUE)
require(Sleuth3)
require(ggplot2)
require(tigerstats)
require(xtable)

fish = case0602

First, let’s plot the data

bwplot(Pair~Percentage, data=fish)

densityplot(~Percentage, groups = Pair, auto.key = TRUE, data=fish)

We might be wondering whether the length matters, or whether there is a lot of variation between the different male pairs. Let’s plot the data:

plot(fish$Length, fish$Percentage)

It certainly doesn’t look like there is a linear trend there… Recalling STA302, let’s try to see anyway:

summary(lm(Percentage~Length, data=fish))

## 
## Call:
## lm(formula = Percentage ~ Length, data = fish)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.063  -8.354  -0.909   9.116  30.760 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  76.0222    26.3567   2.884  0.00501 **
## Length       -0.4230     0.8008  -0.528  0.59878   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.42 on 82 degrees of freedom
## Multiple R-squared:  0.003391,   Adjusted R-squared:  -0.008763 
## F-statistic: 0.279 on 1 and 82 DF,  p-value: 0.5988

It doesn’t look like there is evidence that the length matters for the proportion of time spent with the yellow-swordtailed male. (Which isn’t to say that there isn’t any nonlinear relationship; but it doesn’t really look like that either.)

Now, let’s compute the average percentage of time spent with the yellow-swordtailed male. (We are running two regressions, forcing one of them to have 0 intercept)

summary(lm(Percentage~Pair, data=fish))

## 
## Call:
## lm(formula = Percentage ~ Pair, data = fish)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.429  -8.414   0.247  10.859  28.871 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   56.406      3.864  14.597   <2e-16 ***
## PairPair2      4.479      5.657   0.792   0.4308    
## PairPair3      6.023      5.384   1.119   0.2667    
## PairPair4     10.594      5.657   1.873   0.0649 .  
## PairPair5      7.805      6.441   1.212   0.2292    
## PairPair6      6.929      5.657   1.225   0.2243    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.46 on 78 degrees of freedom
## Multiple R-squared:  0.04796,    Adjusted R-squared:  -0.01307 
## F-statistic: 0.7858 on 5 and 78 DF,  p-value: 0.563

summary(lm(Percentage~0+Pair, data=fish))

## 
## Call:
## lm(formula = Percentage ~ 0 + Pair, data = fish)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.429  -8.414   0.247  10.859  28.871 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## PairPair1   56.406      3.864   14.60   <2e-16 ***
## PairPair2   60.886      4.131   14.74   <2e-16 ***
## PairPair3   62.429      3.749   16.65   <2e-16 ***
## PairPair4   67.000      4.131   16.22   <2e-16 ***
## PairPair5   64.211      5.152   12.46   <2e-16 ***
## PairPair6   63.336      4.131   15.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.46 on 78 degrees of freedom
## Multiple R-squared:  0.9458, Adjusted R-squared:  0.9416 
## F-statistic: 226.8 on 6 and 78 DF,  p-value: < 2.2e-16

Note that really we are probably wondering about the differences between the different pairs because the males are matched by length. We can convert the lengths into factors, and run the regression again in the same way.

summary(lm(Percentage~0+factor(Length), data=fish))

## 
## Call:
## lm(formula = Percentage ~ 0 + factor(Length), data = fish)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.429  -8.372  -0.118  10.859  28.871 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## factor(Length)28   64.211      5.133   12.51   <2e-16 ***
## factor(Length)31   60.886      4.115   14.79   <2e-16 ***
## factor(Length)33   62.429      3.735   16.72   <2e-16 ***
## factor(Length)34   65.168      2.910   22.39   <2e-16 ***
## factor(Length)35   56.406      3.849   14.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15.4 on 79 degrees of freedom
## Multiple R-squared:  0.9455, Adjusted R-squared:  0.9421 
## F-statistic: 274.2 on 5 and 79 DF,  p-value: < 2.2e-16

Now, the lengths were treated like categorical variables (i.e., like indicator variables.)

Let’s compute a couple of values to make sure that we have things right

coefs <- summary(lm(Percentage~Pair, data=fish))$coefficients[, "Estimate"]
c(mean(subset(fish, Pair=="Pair1")$Percentage), coefs["(Intercept)"])

##             (Intercept) 
##    56.40625    56.40625

c(mean(subset(fish, Pair=="Pair2")$Percentage)-
  mean(subset(fish, Pair=="Pair1")$Percentage), coefs["PairPair2"])

##           PairPair2 
##  4.479464  4.479464

c(mean(subset(fish, Pair=="Pair3")$Percentage)-
  mean(subset(fish, Pair=="Pair1")$Percentage), coefs["PairPair3"])

##           PairPair3 
##  6.023162  6.023162

F-test

Let’s go back to the boxplot

bwplot(Pair~Percentage, data=fish)

Do the variances look equal here? I’d say they look a bit fishy, especially with Pair 3. I’m kind of wary of running an F-test here. On the bright side, the density plot indicates that the distributions are mostly normal.

Let’s look at some diagnostic plots

fit <- lm(Percentage~Pair, data=fish)
plot(fit, 1)

plot(fit, 2)

There are definitely outliers there – maybe it’s love!

Still, let’s forge on – it didn’t stop one of the textbooks from trying

anova(lm(Percentage~Pair, data=fish))

## Analysis of Variance Table
## 
## Response: Percentage
##           Df  Sum Sq Mean Sq F value Pr(>F)
## Pair       5   938.7  187.75  0.7858  0.563
## Residuals 78 18636.7  238.93

The F-test (which you may or may not trust) doesn’t say that the pairs are different.

Pairwise t-test

Let’s run the unadjusted pairwise t-test:

with(fish, pairwise.t.test(Percentage, Pair, p.adj = "none"))

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Percentage and Pair 
## 
##       Pair1 Pair2 Pair3 Pair4 Pair5
## Pair2 0.431 -     -     -     -    
## Pair3 0.267 0.783 -     -     -    
## Pair4 0.065 0.299 0.415 -     -    
## Pair5 0.229 0.616 0.781 0.674 -    
## Pair6 0.224 0.676 0.871 0.532 0.895
## 
## P value adjustment method: none

No difference!

Let’s try the Bonferroni correction (of course there will be no difference either…)

with(fish, pairwise.t.test(Percentage, Pair, p.adj = "bonf"))

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Percentage and Pair 
## 
##       Pair1 Pair2 Pair3 Pair4 Pair5
## Pair2 1.00  -     -     -     -    
## Pair3 1.00  1.00  -     -     -    
## Pair4 0.97  1.00  1.00  -     -    
## Pair5 1.00  1.00  1.00  1.00  -    
## Pair6 1.00  1.00  1.00  1.00  1.00 
## 
## P value adjustment method: bonferroni

Finally, let’s use Tukey’s HSD:

TukeyHSD(aov(lm(Percentage ~ Pair, data= fish)), conf.level = 0.95)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lm(Percentage ~ Pair, data = fish))
## 
## $Pair
##                   diff        lwr      upr     p adj
## Pair2-Pair1  4.4794643 -12.048468 21.00740 0.9681236
## Pair3-Pair1  6.0231618  -9.707770 21.75409 0.8722095
## Pair4-Pair1 10.5937500  -5.934183 27.12168 0.4264998
## Pair5-Pair1  7.8048611 -11.013018 26.62274 0.8298365
## Pair6-Pair1  6.9294643  -9.598468 23.45740 0.8233512
## Pair3-Pair2  1.5436975 -14.755803 17.84320 0.9997723
## Pair4-Pair2  6.1142857 -10.955690 23.18426 0.9005094
## Pair5-Pair2  3.3253968 -15.970304 22.62110 0.9959026
## Pair6-Pair2  2.4500000 -14.619976 19.51998 0.9982828
## Pair4-Pair3  4.5705882 -11.728912 20.87009 0.9631262
## Pair5-Pair3  1.7816993 -16.835866 20.39926 0.9997604
## Pair6-Pair3  0.9063025 -15.393198 17.20580 0.9999836
## Pair5-Pair4 -2.7888889 -22.084590 16.50681 0.9982242
## Pair6-Pair4 -3.6642857 -20.734261 13.40569 0.9886626
## Pair6-Pair5 -0.8753968 -20.171098 18.42030 0.9999941

Confidence Intervals

Maybe we’re interested in whether any of the pairs are significantly different from the mean. We can’t just look at each one individually (though we can use Tukey’s HSD). If we want to use Bonferroni, we do go

fit <- lm(Percentage~Pair, data=fish)
confint(fit, level= 1-0.05/nlevels(fish$Pair))

##                0.417 % 99.583 %
## (Intercept)  45.944627 66.86787
## PairPair2   -10.834784 19.79371
## PairPair3    -8.552611 20.59893
## PairPair4    -4.720498 25.90800
## PairPair5    -9.631178 25.24090
## PairPair6    -8.384784 22.24371

Finally, what about the yellow swordtails?

Let’s check for normality one last time, and run the test

hist(fish$Percentage)

t.test(fish$Percentage, mu=.5)

## 
##  One Sample t-test
## 
## data:  fish$Percentage
## t = 36.779, df = 83, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
##  58.79582 65.46132
## sample estimates:
## mean of x 
##  62.12857