knitr::opts_chunk$set(echo = TRUE)
require(Sleuth3)
require(ggplot2)
require(tigerstats)
require(xtable)
fish = case0602
First, let’s plot the data
bwplot(Pair~Percentage, data=fish)
densityplot(~Percentage, groups = Pair, auto.key = TRUE, data=fish)
We might be wondering whether the length matters, or whether there is a lot of variation between the different male pairs. Let’s plot the data:
plot(fish$Length, fish$Percentage)
It certainly doesn’t look like there is a linear trend there… Recalling STA302, let’s try to see anyway:
summary(lm(Percentage~Length, data=fish))
##
## Call:
## lm(formula = Percentage ~ Length, data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.063 -8.354 -0.909 9.116 30.760
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.0222 26.3567 2.884 0.00501 **
## Length -0.4230 0.8008 -0.528 0.59878
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.42 on 82 degrees of freedom
## Multiple R-squared: 0.003391, Adjusted R-squared: -0.008763
## F-statistic: 0.279 on 1 and 82 DF, p-value: 0.5988
It doesn’t look like there is evidence that the length matters for the proportion of time spent with the yellow-swordtailed male. (Which isn’t to say that there isn’t any nonlinear relationship; but it doesn’t really look like that either.)
Now, let’s compute the average percentage of time spent with the yellow-swordtailed male. (We are running two regressions, forcing one of them to have 0 intercept)
summary(lm(Percentage~Pair, data=fish))
##
## Call:
## lm(formula = Percentage ~ Pair, data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.429 -8.414 0.247 10.859 28.871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.406 3.864 14.597 <2e-16 ***
## PairPair2 4.479 5.657 0.792 0.4308
## PairPair3 6.023 5.384 1.119 0.2667
## PairPair4 10.594 5.657 1.873 0.0649 .
## PairPair5 7.805 6.441 1.212 0.2292
## PairPair6 6.929 5.657 1.225 0.2243
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.46 on 78 degrees of freedom
## Multiple R-squared: 0.04796, Adjusted R-squared: -0.01307
## F-statistic: 0.7858 on 5 and 78 DF, p-value: 0.563
summary(lm(Percentage~0+Pair, data=fish))
##
## Call:
## lm(formula = Percentage ~ 0 + Pair, data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.429 -8.414 0.247 10.859 28.871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## PairPair1 56.406 3.864 14.60 <2e-16 ***
## PairPair2 60.886 4.131 14.74 <2e-16 ***
## PairPair3 62.429 3.749 16.65 <2e-16 ***
## PairPair4 67.000 4.131 16.22 <2e-16 ***
## PairPair5 64.211 5.152 12.46 <2e-16 ***
## PairPair6 63.336 4.131 15.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.46 on 78 degrees of freedom
## Multiple R-squared: 0.9458, Adjusted R-squared: 0.9416
## F-statistic: 226.8 on 6 and 78 DF, p-value: < 2.2e-16
Note that really we are probably wondering about the differences between the different pairs because the males are matched by length. We can convert the lengths into factors, and run the regression again in the same way.
summary(lm(Percentage~0+factor(Length), data=fish))
##
## Call:
## lm(formula = Percentage ~ 0 + factor(Length), data = fish)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.429 -8.372 -0.118 10.859 28.871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## factor(Length)28 64.211 5.133 12.51 <2e-16 ***
## factor(Length)31 60.886 4.115 14.79 <2e-16 ***
## factor(Length)33 62.429 3.735 16.72 <2e-16 ***
## factor(Length)34 65.168 2.910 22.39 <2e-16 ***
## factor(Length)35 56.406 3.849 14.65 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.4 on 79 degrees of freedom
## Multiple R-squared: 0.9455, Adjusted R-squared: 0.9421
## F-statistic: 274.2 on 5 and 79 DF, p-value: < 2.2e-16
Now, the lengths were treated like categorical variables (i.e., like indicator variables.)
Let’s compute a couple of values to make sure that we have things right
coefs <- summary(lm(Percentage~Pair, data=fish))$coefficients[, "Estimate"]
c(mean(subset(fish, Pair=="Pair1")$Percentage), coefs["(Intercept)"])
## (Intercept)
## 56.40625 56.40625
c(mean(subset(fish, Pair=="Pair2")$Percentage)-
mean(subset(fish, Pair=="Pair1")$Percentage), coefs["PairPair2"])
## PairPair2
## 4.479464 4.479464
c(mean(subset(fish, Pair=="Pair3")$Percentage)-
mean(subset(fish, Pair=="Pair1")$Percentage), coefs["PairPair3"])
## PairPair3
## 6.023162 6.023162
Let’s go back to the boxplot
bwplot(Pair~Percentage, data=fish)
Do the variances look equal here? I’d say they look a bit fishy, especially with Pair 3. I’m kind of wary of running an F-test here. On the bright side, the density plot indicates that the distributions are mostly normal.
Let’s look at some diagnostic plots
fit <- lm(Percentage~Pair, data=fish)
plot(fit, 1)
plot(fit, 2)
There are definitely outliers there – maybe it’s love!
Still, let’s forge on – it didn’t stop one of the textbooks from trying
anova(lm(Percentage~Pair, data=fish))
## Analysis of Variance Table
##
## Response: Percentage
## Df Sum Sq Mean Sq F value Pr(>F)
## Pair 5 938.7 187.75 0.7858 0.563
## Residuals 78 18636.7 238.93
The F-test (which you may or may not trust) doesn’t say that the pairs are different.
Let’s run the unadjusted pairwise t-test:
with(fish, pairwise.t.test(Percentage, Pair, p.adj = "none"))
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Percentage and Pair
##
## Pair1 Pair2 Pair3 Pair4 Pair5
## Pair2 0.431 - - - -
## Pair3 0.267 0.783 - - -
## Pair4 0.065 0.299 0.415 - -
## Pair5 0.229 0.616 0.781 0.674 -
## Pair6 0.224 0.676 0.871 0.532 0.895
##
## P value adjustment method: none
No difference!
Let’s try the Bonferroni correction (of course there will be no difference either…)
with(fish, pairwise.t.test(Percentage, Pair, p.adj = "bonf"))
##
## Pairwise comparisons using t tests with pooled SD
##
## data: Percentage and Pair
##
## Pair1 Pair2 Pair3 Pair4 Pair5
## Pair2 1.00 - - - -
## Pair3 1.00 1.00 - - -
## Pair4 0.97 1.00 1.00 - -
## Pair5 1.00 1.00 1.00 1.00 -
## Pair6 1.00 1.00 1.00 1.00 1.00
##
## P value adjustment method: bonferroni
Finally, let’s use Tukey’s HSD:
TukeyHSD(aov(lm(Percentage ~ Pair, data= fish)), conf.level = 0.95)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = lm(Percentage ~ Pair, data = fish))
##
## $Pair
## diff lwr upr p adj
## Pair2-Pair1 4.4794643 -12.048468 21.00740 0.9681236
## Pair3-Pair1 6.0231618 -9.707770 21.75409 0.8722095
## Pair4-Pair1 10.5937500 -5.934183 27.12168 0.4264998
## Pair5-Pair1 7.8048611 -11.013018 26.62274 0.8298365
## Pair6-Pair1 6.9294643 -9.598468 23.45740 0.8233512
## Pair3-Pair2 1.5436975 -14.755803 17.84320 0.9997723
## Pair4-Pair2 6.1142857 -10.955690 23.18426 0.9005094
## Pair5-Pair2 3.3253968 -15.970304 22.62110 0.9959026
## Pair6-Pair2 2.4500000 -14.619976 19.51998 0.9982828
## Pair4-Pair3 4.5705882 -11.728912 20.87009 0.9631262
## Pair5-Pair3 1.7816993 -16.835866 20.39926 0.9997604
## Pair6-Pair3 0.9063025 -15.393198 17.20580 0.9999836
## Pair5-Pair4 -2.7888889 -22.084590 16.50681 0.9982242
## Pair6-Pair4 -3.6642857 -20.734261 13.40569 0.9886626
## Pair6-Pair5 -0.8753968 -20.171098 18.42030 0.9999941
Maybe we’re interested in whether any of the pairs are significantly different from the mean. We can’t just look at each one individually (though we can use Tukey’s HSD). If we want to use Bonferroni, we do go
fit <- lm(Percentage~Pair, data=fish)
confint(fit, level= 1-0.05/nlevels(fish$Pair))
## 0.417 % 99.583 %
## (Intercept) 45.944627 66.86787
## PairPair2 -10.834784 19.79371
## PairPair3 -8.552611 20.59893
## PairPair4 -4.720498 25.90800
## PairPair5 -9.631178 25.24090
## PairPair6 -8.384784 22.24371
Let’s check for normality one last time, and run the test
hist(fish$Percentage)
t.test(fish$Percentage, mu=.5)
##
## One Sample t-test
##
## data: fish$Percentage
## t = 36.779, df = 83, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
## 58.79582 65.46132
## sample estimates:
## mean of x
## 62.12857