knitr::opts_chunk$set(echo = TRUE)
Let’s say that we have some data distributed with \(N(0, 1.2^2)\). Let’s generate the data:
N = 140
sigma <- 1.2
mu <- 0
dat <- rnorm(n=N, mean=mu, sd= sigma)
Suppose the Null Hypothesis is that \(\sigma = 1.0\). How can we test it? Let’s construct an interval in which \(\sum_i \left( \frac{X_i-\bar{X}}{1.0}\right)^2\) will fall 95% of the time assuming the Null Hypothesis is true. That statistic is distributed according to \(\chi^2(N-1)\) (i.e., \(\chi^2(139)\)) if the Null Hypothesis is true.
Compute the statistic:
sum_res <- sum( ((dat-mean(dat))/1.0)^2 )
sum_res
## [1] 240.7838
Now, compute the interval:
c(qchisq(.025, 139), qchisq(0.975, 139))
## [1] 108.2543 173.5303
We can reject the Null Hypothesis.
Note that, assuming we decide ahead of time which interval to compute, we could go with a different interval as well. For example, here’s another legitimate interval.
c(qchisq(.01, 139), qchisq(0.96, 139))
## [1] 103.1737 169.5193
Why can’t we pick an choose which interval to use after the fact?
How to compute stuff like qchisq
, qt
, or qnorm
anyway? Let’s say we need qnorm(.95)
Here’s how to compute qt
. First, generate lots of samples from the normal distribution:
N = 10000
samples <- rnorm(n=N)
Now, we need to find a number \(t\) such that samples from the normal are lower than \(t\) 95% of the time. We can do it by sorting our samples, and then take sample number \(round(.95N)\) in the sorted vector.
sorted_samples <- sort(samples)
sorted_samples[round(.95*N)]
## [1] 1.638972
Did we get it right?
qnorm(.95)
## [1] 1.644854
Looks like it.
Now, let’s try to do the same thing with \(\chi^2(10)\). I’ll use a loop even though it’s a little inefficient
chisq10 <- c()
for (i in 1:N){
chisq10 <- c(chisq10, sum(rnorm(10)^2))
}
sorted_samples <- sort(chisq10)
sorted_samples[round(.95*N)]
## [1] 18.24429
Did we get it right?
qchisq(.95, df=10)
## [1] 18.30704
Looks like it!