t-Tests

knitr::opts_chunk$set(echo = TRUE)
require(Sleuth3)
require(ggplot2)
require(tigerstats)
require(xtable)

finches <- case0201

The two-sample t-Test

We can now compute the pooled SD estimate of the difference between the pre-draught and post-draught samples

X <- subset(finches, Year == 1976)$Depth
Y <- subset(finches, Year == 1978)$Depth

pooled_sd <- sqrt((sum((X-mean(X))^2) + sum((Y-mean(Y))^2))/(length(X)+length(Y)-2))

Let’s compute some p-values for the Null Hypothesis that \(\mu_X = \mu_Y\). We have that

\[\frac{\bar{X}-\bar{Y}}{s_p\sqrt{1/N_X+1/N_Y}}\]

is t-distributed with \(N_X+N_Y-2\) d.f.

Nx = length(X)
Ny = length(Y)
est = mean(X) - mean(Y)
pooled_sd = sqrt((sum((X-mean(X))^2) + sum((Y-mean(Y))^2))/(length(X)+length(Y)-2))
diff_sd = pooled_sd*sqrt(1/Nx + 1/Ny)
df = Nx + Ny - 2

Let’s compute the two-sided and one-sided p-values. One-sided:

df = Nx + Ny - 2
pt(est/diff_sd, df = df)

## [1] 4.324758e-06

Two-sided:

2*pt(est/diff_sd, df = df)

## [1] 8.649515e-06

This was easy: in this case (and in most cases we’ll see), the one-sided p-value is just twice the one-sided p-value.

It’s possible to use R’s t-test function to compute the same values:

t.test(X, Y, alternative="two.sided", var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  c(6.2, 6.8, 7.1, 7.1, 7.4, 7.8, 7.9, 8, 8.2, 8.4, 8.4, 8.4, 8.5,  ... and c(7.1, 7.9, 8, 8.4, 8.4, 8.7, 8.7, 8.8, 9, 9, 9.1, 9.1, 9.1,  ...
## t = -4.5833, df = 176, p-value = 8.65e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9564088 -0.3806698
## sample estimates:
## mean of x mean of y 
##  9.469663 10.138202

t.test(X, Y, alternative="less", var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  c(6.2, 6.8, 7.1, 7.1, 7.4, 7.8, 7.9, 8, 8.2, 8.4, 8.4, 8.4, 8.5,  ... and c(7.1, 7.9, 8, 8.4, 8.4, 8.7, 8.7, 8.8, 9, 9, 9.1, 9.1, 9.1,  ...
## t = -4.5833, df = 176, p-value = 4.325e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf -0.4273433
## sample estimates:
## mean of x mean of y 
##  9.469663 10.138202

Before, we were assuming equal variances. We didn’t have to. t-test uses an approximation when the two variances are not assumed to be equal.

t.test(X, Y, alternative="two.sided", var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  c(6.2, 6.8, 7.1, 7.1, 7.4, 7.8, 7.9, 8, 8.2, 8.4, 8.4, 8.4, 8.5,  ... and c(7.1, 7.9, 8, 8.4, 8.4, 8.7, 8.7, 8.8, 9, 9, 9.1, 9.1, 9.1,  ...
## t = -4.5833, df = 172.98, p-value = 8.739e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.9564436 -0.3806350
## sample estimates:
## mean of x mean of y 
##  9.469663 10.138202

t.test(X, Y, alternative="less", var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  c(6.2, 6.8, 7.1, 7.1, 7.4, 7.8, 7.9, 8, 8.2, 8.4, 8.4, 8.4, 8.5,  ... and c(7.1, 7.9, 8, 8.4, 8.4, 8.7, 8.7, 8.8, 9, 9, 9.1, 9.1, 9.1,  ...
## t = -4.5833, df = 172.98, p-value = 4.37e-06
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf -0.427321
## sample estimates:
## mean of x mean of y 
##  9.469663 10.138202

Robustness of the t-test to deviations from assumptions

Long tails

Generally, thanks to the CLT, the t-tests are pretty robust to deviations from assumptions. However, long-tailed distrbutions will present problems.

Here’s an example of a long-tailed distrbution. In this case, it looks basically like a normal distribution, except occasionally it has outliers.

ind1 <- rbinom(100, 1, .02)
X1 <- (1-ind1)*rnorm(100, sd=5) + ind1*rnorm(100, sd=100)
hist(X1)

Let’s simulate the t-test for this kind of case

p = c()
for (i in 1:1000){
  ind1 <- rbinom(100, 1, .01)
  X1 <- (1-ind1)*rnorm(100, sd=5) + ind1*rnorm(100, sd=100)
  
  ind2 <- rbinom(100, 1, .01)
  X2 <- (1-ind2)*rnorm(100, sd=5) + ind2*rnorm(100, sd=100)
  
  p <- c(p, t.test(X1, X2, alternative="two.sided", var.equal=FALSE)$p.value)
}

mean(p<.05)

## [1] 0.026

As you can see, we rejected the null hypothesis less often than expected – not a problem if we’re only worried about Type I errors, but a big problem if we’re worried about Type II errors!

Non-independence

Let’s now make our samples non-independent (What do you think the effect would be?)

p = c()
for (i in 1:1000){
  X1 <- c(rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ))
  X2 <- c(rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ), rep(rnorm(1), 20 ))
  
  p = c(p, t.test(X1, X2, alternative="two.sided", var.equal=FALSE)$p.value)
}

mean(p<.05)

## [1] 0.698

A disaster! Of course, the data looks anything but normal if you plot it:

hist(X1)

Multiplicative effects

Rainfall was measured on days when cumulus clouds were seeded and unseeded with silver iodide.

clouds <- case0301
ggplot(clouds, aes(x=Rainfall)) + 
  geom_histogram(data=subset(clouds, Treatment == "Unseeded"), fill = "red", alpha = 0.2) + 
  geom_histogram(data=subset(clouds, Treatment == "Seeded"), fill = "blue", alpha = 0.2)

There’s clearly a difference there, but the disibutions are very skewed. Can we do better if we apply the log transformation to the data?

clouds[, "lograin"] <- log(1+clouds$Rainfall)
ggplot(clouds, aes(x=lograin)) + 
  geom_histogram(data=subset(clouds, Treatment == "Unseeded"), fill = "red", alpha = 0.2) + 
  geom_histogram(data=subset(clouds, Treatment == "Seeded"), fill = "blue", alpha = 0.2)

Looks better, but in this case a density plot would be better

densityplot(~lograin, groups = Treatment, auto.key = TRUE, data = clouds)

Close enough!

t.test(lograin~Treatment, alternative="two.sided", data=clouds)

## 
##  Welch Two Sample t-test
## 
## data:  lograin by Treatment
## t = 2.5602, df = 49.995, p-value = 0.01353
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2380266 1.9714413
## sample estimates:
##   mean in group Seeded mean in group Unseeded 
##               5.155938               4.051205

Interpretation: seeding clouds increases rainfall by a factor of somewhere between 1.3 and 7.4.

Let’s try the wrong t-test, just for fun

t.test(X1, X2, alternative="two.sided", var.equal=FALSE)

## 
##  Welch Two Sample t-test
## 
## data:  c(-2.45012668684029, -2.45012668684029, -2.45012668684029, -2.45012668684029,  ... and c(-0.931271415781827, -0.931271415781827, -0.931271415781827,  ...
## t = 0.20852, df = 140.49, p-value = 0.8351
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2531970  0.3129058
## sample estimates:
##  mean of x  mean of y 
## -0.8328534 -0.8627078

The p-value in this case is meaningless, but let’s point and laugh: it’s larger than 0.05.

The Sampling Distribution of the Sample Variance

Let’s now return to guessing at an appropriate standard deviation.

upper = (89*2-2)/(pooled_sd^2*qchisq(.05, df=2*89-2))
lower = (89*2-2)/(pooled_sd^2*qchisq(.95, df=2*89-2))
c(lower, upper)

## [1] 0.8938857 1.2704372

So a guess of 1.0 is completely appropriate.