Computing Distribution Quantiles, and fun with Chi-Squared

knitr::opts_chunk$set(echo = TRUE)

Let’s say that we have some data distributed with \(N(0, 1.2^2)\). Let’s generate the data:

N = 140
sigma <- 1.2
mu <- 0
dat <- rnorm(n=N, mean=mu, sd= sigma)

Suppose the Null Hypothesis is that \(\sigma = 1.0\). How can we test it? Let’s construct an interval in which \(\sum_i \left( \frac{X_i-\bar{X}}{1.0}\right)^2\) will fall 95% of the time assuming the Null Hypothesis is true. That statistic is distributed according to \(\chi^2(N-1)\) (i.e., \(\chi^2(139)\)) if the Null Hypothesis is true.

Compute the statistic:

sum_res <- sum(  ((dat-mean(dat))/1.0)^2 )
sum_res

## [1] 240.7838

Now, compute the interval:

c(qchisq(.025, 139), qchisq(0.975, 139))

## [1] 108.2543 173.5303

We can reject the Null Hypothesis.

Note that, assuming we decide ahead of time which interval to compute, we could go with a different interval as well. For example, here’s another legitimate interval.

c(qchisq(.01, 139), qchisq(0.96, 139))

## [1] 103.1737 169.5193

Why can’t we pick an choose which interval to use after the fact?

How to compute stuff like qchisq, qt, or qnorm anyway? Let’s say we need qnorm(.95)

Here’s how to compute qt. First, generate lots of samples from the normal distribution:

N = 10000
samples <- rnorm(n=N)

Now, we need to find a number \(t\) such that samples from the normal are lower than \(t\) 95% of the time. We can do it by sorting our samples, and then take sample number \(round(.95N)\) in the sorted vector.

sorted_samples <- sort(samples)
sorted_samples[round(.95*N)]

## [1] 1.638972

Did we get it right?

qnorm(.95)

## [1] 1.644854

Looks like it.

Now, let’s try to do the same thing with \(\chi^2(10)\). I’ll use a loop even though it’s a little inefficient

chisq10 <- c()
for (i in 1:N){
  chisq10 <- c(chisq10, sum(rnorm(10)^2))
}

sorted_samples <- sort(chisq10)
sorted_samples[round(.95*N)]

## [1] 18.24429

Did we get it right?

qchisq(.95, df=10)

## [1] 18.30704

Looks like it!