% 260s20Assignment.tex Confidence intervals for normal random samping \documentclass[11pt]{article} %\usepackage{amsbsy} % for \boldsymbol and \pmb %\usepackage{graphicx} % To include pdf files! \usepackage{amsmath} \usepackage{amsbsy} \usepackage{amsfonts} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} % For links \usepackage{comment} %\usepackage{fullpage} \oddsidemargin=0in % Good for US Letter paper \evensidemargin=0in \textwidth=6.3in \topmargin=-1in \headheight=0.2in \headsep=0.5in \textheight=9.4in %\pagestyle{empty} % No page numbers \begin{document} %\enlargethispage*{1000 pt} \begin{center} {\Large \textbf{STA 260s20 Assignment Four: Confidence Intervals Part Two}}\footnote{Copyright information is at the end of the last page.} %\vspace{1 mm} \end{center} \noindent These homework problems are not to be handed in. They are preparation for Quiz 4 (Week of Feb.~10) and Term Test 2. \textbf{Please try each question before looking at the solution}. %\vspace{5mm} \begin{enumerate} \item Here are some distribution facts that are helpful to know without looking at a formula sheet. You are responsible for the proofs of these facts too, but here you are just supposed to write down the answers. \begin{enumerate} \item Let $X\sim N(\mu,\sigma^2)$ and $Y=aX+b$, where $a$ and $b$ are constants. What is the distribution of $Y$? \item Let $X\sim N(\mu,\sigma^2)$ and $Z = \frac{X-\mu}{\sigma}$. What is the distribution of $Z$? \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. What is the distribution of $Y = \sum_{i=1}^nX_i$? \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. What is the distribution of the sample mean $\overline{X}$? \item Let $X_1, \ldots, X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution. What is the distribution of $Z = \frac{\sqrt{n}(\overline{X}-\mu)}{\sigma}$? \item \label{combo} Let $X_1, \ldots, X_n$ be independent random variables, with $X_i \sim N(\mu_i,\sigma_i^2)$. Let $a_1, \ldots, a_n$ be constants. What is the distribution of $Y = \sum_{i=1}^n a_iX_i$? \end{enumerate} \item Use the formula sheet as necessary for this question. \begin{enumerate} \item Let $X_1, \ldots, X_n$ be independent random variables with $X_i \sim \chi^2(\nu_i)$ for $i=1, \ldots, n$. Find the distribution of $Y = \sum_{i=1}^n X_i$. Show your work. Your answer includes a statement of the parameter(s). \item Let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. Using your answers to earlier questions in this assignment, find the distribution of $Y = \frac{1}{\sigma^2} \sum_{i=1}^n\left(X_i-\mu \right)^2$. Your answer includes a statement of the parameter value(s). \item Let $Y=Y_1+Y_2$, where $Y_1$ and $Y_2$ are independent, $Y_1\sim\chi^2(\nu_1)$ and $Y\sim\chi^2(\nu_1+\nu_2)$, where $\nu_1$ and $\nu_2$ are both positive. Derive the distribution of $Y_2$. Your answer includes a statement of the parameter value(s). \end{enumerate} \item \label{normalsample} Let $X_1, \ldots, X_n \stackrel{i.i.d.}{\sim} N(\mu,\sigma^2)$. The sample variance is $S^2 = \frac{\sum_{i=1}^n\left(X_i-\overline{X} \right)^2 }{n-1}$. \begin{enumerate} \item The difference $X_j-\overline{X}_n$ is a linear combination of the form $\sum_{i=1}^n a_iX_i$, as in Question~(\ref{combo}). What are the coefficients $a_1, \ldots, a_n$? \item Show $Cov(\overline{X},X_j-\overline{X})=0$ for every $j=1, \ldots, n$. \item How do you know that $\overline{X}$ and $s^2$ are independent? You may use without proof the fact that for the normal distribution, zero covariance implies independence, and that functions of independent random variables are also independent. \item Show that $\sum_{i=1}^n\left(X_i-\mu \right)^2 = \sum_{i=1}^n\left(X_i-\overline{X}\right)^2 + n\left(\overline{X}-\mu \right)^2$. \item Prove that $Y_1 = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$ \end{enumerate} \pagebreak \item Again, let $X_1, \ldots, X_n$ be random sample from a $N(\mu,\sigma^2)$ distribution. \begin{enumerate} \item The $t$ distribution is defined as follows. Let $Z\sim N(0,1)$ and $Y \sim \chi^2(\nu)$, with $Z$ and $Y$ independent. Then $T = \frac{Z}{\sqrt{Y/\nu}}$ is said to have a $t$ distribution with $\nu$ degrees of freedom, and we write $T \sim t(\nu)$. Using results from earlier questions, prove $T = \frac{\sqrt{n}(\overline{X}-\mu)}{S} \sim t(n-1)$. Be sure to indicate why your $Z$ and $Y$ are independent. \item You can see that $T$ is a ``pivotal quantity." It's a random variable that is a function of the parameter, but whose distribution does not depend on the parameter value. Starting with a probability statement about the pivotal quantity, derive an exact $(1-\alpha)100\%$ confidence interval for $\mu$. ``Derive" means show all the high school algebra. Your answer is a pair of formulas, one for the lower confidence limit and one for the upper confidence limit. \item \label{sleep} The $t$ distribution was introduced by William Gossett, writing under the name Student. The reference is Student (1908). ``The probable error of a mean," \emph{Biometrika} 6, 1-25. Gossett illustrated the method using two measurements on ten patients suffering from insomnia (trouble sleeping). Each number is a difference, representing how much \emph{extra} sleep the patient got when taking a sleeping pill, compared to a baseline measurement. Drug 1 is Dextro-hyoscyamine hydrobomide, while Drug 2 is Laevo-hyoscyamine hydrobomide. Each patent tried both drugs, with a recovery period of several days between trials. Here are the data: \begin{verbatim} Patient Drug 1 Drug 2 1 0.7 1.9 2 -1.6 0.8 3 -0.2 1.1 4 -1.2 0.1 5 -0.1 -0.1 6 3.4 4.4 7 3.7 5.5 8 0.8 1.6 9 0.0 4.6 10 2.0 3.4 \end{verbatim} The strategy here is to compute a difference for each patient, Drug~1 minus Drug~2. The difference represents how much more sleep the patient got when using Drug~1. The differences are $X_1, \ldots, X_{10}$, and $\mu$ is the expected advantage of Drug~1 over Drug~2. Give a point estimate and a 95\% confidence interval for $\mu$. The point estimate is a single number. The confidence interval is a pair of numbers, the lower confidence limit and the upper confidence limit. % xbar = -1.58, ci = (-2.46 -0.70) \item Does the confidence interval allow you to decide which drug worked better? \item Derive an exact $(1-\alpha)100\%$ confidence interval for $\sigma^2$. ``Derive" means show all the high school algebra. Your answer is a pair of formulas, one for the lower confidence limit and one for the upper confidence limit. You can locate the pivotal quantity in your answers to earlier questions. \item Using the data from Question~(\ref{sleep}), give a 95\% confidence interval for $\sigma^2$. The answer is a pair of numbers, the lower confidence limit and the upper confidence limit. \end{enumerate} % End of one-sample t problem \pagebreak \item It is natural to be interested in the difference between two expected values. For example, volunteer patients in a clinical trial might be randomly assigned to receive either blood pressure medicine A or blood pressure medicine B. One way to ask which medicine is more effective is to compare the two expected blood pressures. Or, we might wonder whether there is a difference between men and women in their expected score on a racism questionnaire. Accordingly, let $X_1, \ldots, X_{n_1} \stackrel{i.i.d.}{\sim} N(\mu_1,\sigma^2)$, and $Y_1, \ldots, Y_{n_2} \stackrel{i.i.d.}{\sim} N(\mu_2,\sigma^2)$. In addition, the $X_i$ are independent of the $Y_j$. This is a model for random sampling from two separate populations. Notice that while the two expected values might be different, the two normal distributions in this model have the same variance. This is for technical convenience, as you will see. \begin{enumerate} \item The quantity of primary interest here is $\mu_1-\mu_2$. Give a natural point estimator of $\mu_1-\mu_2$. \item What is the distribution of $\overline{X}-\overline{Y}$? You should be able to just write it down. \item Standardize the difference to obtain $Z$. This will be the numerator of $T$. \item Using the fact that the sum of two independent chi-squares is chi-squared, propose a nice $Y$ variable to go in the denominator of $T$. How do you know that $Z$ and $Y$ are independent? \item Write a formula for your $T$ random variable, and simplify. The cancellation of $\sigma^2$ in numerator and denominator is wonderful. This is why the variances of the two populations are assumed equal; it's not to make the model more realistic. \item The formula for $T$ is usually written \begin{displaymath} T = \frac{\overline{X}-\overline{Y} - (\mu_1-\mu_2)} {S_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \mbox{ where } S_p = \sqrt{\frac{(n_1-1)S^2_1 + (n_2-1)S^2_2 }{n_1+n_2-2}} \end{displaymath} The statistic $S_p$ is called the ``pooled" estimated standard deviation. Simplify your $T$ formula a bit more until it has this form. \item What are the degrees of freedom of $T$? That is, what is the parameter $\nu$? \item Derive an exact $(1-\alpha)100\%$ confidence interval for $\mu_1-\mu_2$. Your answer is a pair of formulas, one for the lower confidence limit and one for the upper confidence limit. \item \label{twoT} Two surgeons in a cosmetic surgery practice decide to have a friendly competition. The wait list has 20 patients who want surgery to make their noses smaller. Ten patients are randomly assigned to Surgeon A, and the other ten are assigned to Surgeon B. A panel of medical students rate the facial appearance of the patients on a 100 point scale before surgery and again six weeks after. The number for each patient is improvement (according to the medical students): After minus before. Of course, the medical students are not told which doctor did the surgery. Because of scheduling problems and drop-out (people change their minds), Surgeon A only did nine surgeries, and Surgeon B did seven. So with $n_1=9$ and $n_2=7$, we have $\overline{x}=14.1$, $s^2_1=48.2$, $\overline{y}=13.3$, $s^2_1=32.7$. Give a point estimate and a 95\% confidence interval for $\mu_1-\mu_2$. The point estimate is a single number, and the confidence interval is a pair of numbers, a lower confidence limit and an upper confidence limit. % 0.8 pm 6.97 = (-6.17, 7.77) \item Who wins the contest? \end{enumerate} % End of 2-sample t question \pagebreak \item The $F$ distribution is defined as follows. Let $Y_1 \sim \chi^2(\nu_1)$ and $Y_2 \sim \chi^2(\nu_2)$ be independent. Then $F = \frac{Y_1/\nu_2}{Y_2/\nu_2}$ is said to have an $F$ distribution with $\nu_1$ and $\nu_2$ degrees of freedom, and we write $F \sim F(\nu_1,\nu_2)$. For obvious reasons, $\nu_1$ is sometimes called the ``numerator degrees of freedom," and $\nu_2$ is called the ``denominator degrees of freedom." \begin{enumerate} \item What is the support of the $F$ distribution? \item Show that if $F_1 \sim F(\nu_1,\nu_2)$, then \begin{enumerate} \item $F_2 = 1/F_1 \sim F(\nu_2,\nu_1)$. \item $P(F_1 \leq x) = P(F_2 \geq 1/x)$. \end{enumerate} \item For a single random sample from a normal distribution, you have shown that $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$. So for two independent random samples, it's easy to locate a pivotal quantity for the ratio $\frac{\sigma^2_1}{\sigma^2_2}$ based on the $F$ distribution. Write the pivotal quantity. What are the degrees of freedom? \item Derive a 95\% confidence interval for $\frac{\sigma^2_1}{\sigma^2_2}$. \item Using the statistics from Question (\ref{twoT}), give a 95\% confidence interval for $\frac{\sigma^2_1}{\sigma^2_2}$. The tough part is the critical values. The $F$ tables in the back of the text (not part of the formula sheet) have some explanation up front. Your answer is a pair of numbers, a lower confidence limit and an upper confidence limit. \item The confidence interval for the difference between means depends on the assumption that $\sigma^2_1 = \sigma^2_2$. Does your confidence interval for $\frac{\sigma^2_1}{\sigma^2_2}$ suggest that this assumption is incorrect? \end{enumerate} % End of F distribution question. % $n_1=9$ and $n_2=7$ df = 6,8 = 7-1,9-1 % Tble sez F_0.975 = 4.65 % Instructions say that F_0.025 is 1/F_0.975 for df = 8,6: 1/5.60 = 0.1786 % > qf(0.025,6,8) % [1] 0.1785835 % > qf(0.975,6,8) % [1] 4.651696 \end{enumerate} % End of all the questions \vspace{90mm} \vspace{3mm} \hrule %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \vspace{3mm} \noindent This assignment was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Mathematical and Computational Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. The \LaTeX~source code is available from the course website: \begin{center} \href{http://www.utstat.toronto.edu/~brunner/oldclass/260s20} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/260s20}} \end{center} \end{document} > sleep = read.table("http://www.utstat.toronto.edu/~brunner/data/legal/studentsleep.data.txt", header=T) > sleep Patient Drug1 Drug2 1 1 0.7 1.9 2 2 -1.6 0.8 3 3 -0.2 1.1 4 4 -1.2 0.1 5 5 -0.1 -0.1 6 6 3.4 4.4 7 7 3.7 5.5 8 8 0.8 1.6 9 9 0.0 4.6 10 10 2.0 3.4 > attach(sleep) > diff = Drug1-Drug2 > c(mean(diff,var(diff))) [1] -1.3 > c(mean(diff,var(diff)) + ) [1] -1.3 > c(mean(diff),var(diff)) [1] -1.580000 1.512889 > t.test(diff) One Sample t-test data: diff t = -4.0621, df = 9, p-value = 0.002833 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -2.4598858 -0.7001142 sample estimates: mean of x -1.58