Sections 4.2 and 4.4
STA 256: Fall 2019

Overview

Law of Large Numbers

Infinite Sequence of random variables

$T_1, T_2, \ldots$

We are interested in what happens to $T_n$ as $n \rightarrow \infty$.
Why even think about this?
For fun.
And because $T_n$ could be a sequence of statistics, numbers computed from sample data. For example, $T_n = \overline{X}_n = \frac{1}{n}\sum_{i=1}^nX_i$.
$n$ is the sample size.
$n \rightarrow \infty$ is an approximation of what happens for large samples.
Good things should happen when estimates are based on more information.

Convergence

Convergence of $T_n$ as $n \rightarrow \infty$ is not an ordinary limit, because probability is involved.
There are several different types of convergence.
We will work with convergence in probability and convergence in distribution. Convergence in Probability to a random variable

Definition: The sequence of random variables $X_1, X_2, \ldots$ is said to converge in probability to the random variable $Y$ if for all $\epsilon > 0$, $\displaystyle \lim_{n \rightarrow \infty}P\{|X_n-Y|\geq\epsilon\} = 0$, and we write $X_n \stackrel{p}{\rightarrow} Y$.

$|X_n-Y| < \epsilon \Leftrightarrow -\epsilon < X_n-Y < \epsilon \Leftrightarrow Y-\epsilon < X_n < Y+\epsilon$

Convergence in Probability to a constant
More immediate applications in statistics: We will focus on this.

Definition: The sequence of random variables $T_1, T_2, \ldots$ is said to converge in probability to the constant $c$ if for all $\epsilon > 0$,
$\lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0$
and we write $T_n \stackrel{p}{\rightarrow} c$.
$|T_n-c| < \epsilon \Leftrightarrow -\epsilon < T_n-c < \epsilon \Leftrightarrow c-\epsilon < T_n < c+\epsilon$ Example: $T_n \sim U(-\frac{1}{n}, \frac{1}{n})$
Convergence in probability means $\lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0$

$T_1$ is uniform on $(-1,1)$. Height of the density is $\frac{1}{2}$.
$T_2$ is uniform on $(-\frac{1}{2},\frac{1}{2})$. Height of the density is 1.
$T_3$ is uniform on $(-\frac{1}{3},\frac{1}{3})$. Height of the density is $\frac{3}{2}$.
Eventually, $\frac{1}{n} < \epsilon$ and $P\{|T_n-0|\geq\epsilon\} = 0$, forever.
Eventually means for all $n>\frac{1}{\epsilon}$. Example: $X_1, \ldots, X_n$ are independent $U(0,\theta)$
Convergence in probability means $\lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0$

For $0 < x < \theta$,
$F_{_{X_i}}(x) = \int_0^x \frac{1}{\theta} \, dt = \frac{x}{\theta}$.
$Y_n = \max_i (X_i)$.
$F_{_{Y_n}}(y) = \left(\frac{y}{\theta}\right)^n$ $P\{|Y_n-\theta|\geq\epsilon\} = F_{_{Y_n}}(\theta-\epsilon) = \left(\frac{\theta-\epsilon}{\theta}\right)^n \rightarrow 0$ because $\frac{\theta-\epsilon}{\theta}<1$.
So the observed maximum data value goes in probability to $\theta$, the theoretical maximum data value.

Markov's inequality: Theorem 3.6.1
A stepping stone

Let $Y$ be a random variable with $P(Y \geq 0)=1$. Then for any $a>0$, $E(Y) \geq a \, P(Y \geq a)$.

Proof (for continuous random variables):
$E(Y) = \int_0^\infty y f(y) \, dy = \int_0^a y f(y) \, dy + \int_a^\infty y f(y) \, dy \geq \int_a^\infty y f(y) \, dy \geq \int_a^\infty a f(y) \, dy = a \int_a^\infty f(y) \, dy = a \, P(Y \geq a)$ The Variance Rule
Not in the text, I believe

Let $T_1, T_2, \ldots$ be a sequence of random variables, and let $c$ be a constant. \begin{frame} \frametitle{The Variance Rule} \framesubtitle{Not in the text, I believe} {\large Let $T_1, T_2, \ldots$ be a sequence of random variables, and let $c$ be a constant. If \begin{itemize} \item $\displaystyle \lim_{n \rightarrow \infty}E(X_n) = c$ and \item $\displaystyle \lim_{n \rightarrow \infty}Var(X_n) = 0$ \end{itemize} Then $T_n \stackrel{p}{\rightarrow} c$. } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Proof of the Variance Rule} \framesubtitle{Using Markov's inequality: $E(Y) \geq a \, P(Y \geq a)$} \pause {\small Seek to show $\forall \epsilon > 0$, $\displaystyle \lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0$. \pause Denote $E(T_n)$ by $\mu_n$. \pause In Markov's inequality, let $Y=(T_n-c)^2$, and $a = \epsilon^2$. \pause \begin{eqnarray*} E[(T_n-c)^2] & \geq & \epsilon^2 P\{ (T_n-c)^2 \geq \epsilon^2 \} \\ \pause & = & \epsilon^2 P\{ |T_n-c| \geq \epsilon \}, \mbox{ so} \\ \pause %\end{eqnarray*} %\begin{eqnarray*} 0 & \leq & P\{ |T_n-c| \geq \epsilon \} \leq \frac{1}{\epsilon^2} E[(T_n-c)^2] \\ \pause & = & \frac{1}{\epsilon^2} E[(T_n-\mu_n + \mu_n - More comments

Law of Large Numbers is the basis of using simulation to estimate probabilities.
Have things like $\frac{1}{n}\sum_{i=1}^nX_i^2 \stackrel{p}{\rightarrow} E(X^2)$
In fact, $\frac{1}{n}\sum_{i=1}^ng(X_i) \stackrel{p}{\rightarrow} E[g(X)]$
Convergence in probability also applies to vectors of random variables, like $(X_n,Y_n) \stackrel{p}{\rightarrow} (c_1,c_2)$.

Theorem
Continuous Mapping Theorem for convergence in probability

Let $g(x)$ be a function that is continuous at $x=c$. If $T_n \stackrel{p}{\rightarrow} c$, then $g(T_n) \stackrel{p}{\rightarrow} g(c)$.

Examples:
A Geometric distribution has expected value $\frac{1-\theta}{\theta}$. $g(\overline{X}_n) = 1/(1+\overline{X}_n)$ converges in probability to
$\frac{1}{1+E(X_i)} = \frac{1}{1+\frac{1-\theta}{\theta}} = \theta$
A Uniform($0,\theta$) distribution has expected value $\theta/2$. So $2\overline{X}_n \stackrel{p}{\rightarrow} 2E(X_i) = 2\frac{\theta}{2}=\theta$

Background
For the proof of the continuous mapping theorem

$T_n \stackrel{p}{\rightarrow} c$ means that for all $\epsilon > 0$,
$\lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0 \Leftrightarrow \lim_{n \rightarrow \infty}P\{|T_n-c|< \epsilon\} = 1$

$g(x)$ continuous at $c$ means that for all $\epsilon > 0$, there exists $\delta>0$ such that if $|x-c|<\delta$, then $|g(x)-g(c)| < \epsilon$.

Proof of the Continuous Mapping Theorem
For convergence in probability

Have $T_n \stackrel{p}{\rightarrow} c$ and $g(x)$ continuous at $c$. So \\ \pause $2\overline{X}_n \stackrel{p}{\rightarrow} 2E(X_i) \pause = 2\frac{\theta}{2}=\theta$ \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Background } \pause \framesubtitle{For the proof of the continuous mapping theorem} \begin{itemize} \item $T_n \stackrel{p}{\rightarrow} c$ means that for all $\epsilon > 0$, \begin{eqnarray*} & & \lim_{n \rightarrow \infty}P\{|T_n-c|\geq\epsilon\} = 0 \\ & \Leftrightarrow & \lim_{n \rightarrow \infty}P\{|T_n-c|< \epsilon\} = 1 \end{eqnarray*} \vspace{7mm} \begin{picture}(10,10)(25,-25) % Line, direction (1,0), horizontal extent 200, starting point (50,0) \put(50,0){\line(1,0){200} } \put(150,5){\line(0,-1){10} } \put(148,-15){$c$} \put(100,-2){(} % Left parenthesis \put(200,-2){)} % Right parenthesis \put(90,-15){$c-\epsilon$} \put(190,-15){$c+\epsilon$} \end{picture} \pause % \vspace{5mm} \item $g(x)$ continuous at $c$ means that for all $\epsilon > 0$, there exists $\delta>0$ such that if $|x-c|<\delta$, then $|g(x)-g(c)| < \epsilon$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Proof of the Continuous Mapping Theorem} \framesubtitle{For convergence in probability} \begin{columns} \column{1.1\textwidth} % To use more margin Have $T_n \stackrel{p}{\rightarrow} c$ and $g(x)$ continuous at $c$. Seek to show that for all $\epsilon > 0$, \pause $ \lim_{n \rightarrow \infty}P\{|g(T_n)-g(c)|< \epsilon\} = 1$. \pause Let $\epsilon > 0$ be given. \pause $g(x)$ continuous at $c$ means there exists $\delta>0$ such that for $s\in S$, if $|X_n(s)-c|<\delta$, then $|g(X_n(s))-g(c)| < \epsilon$. \pause That is, \vspace{3mm} If $s_0 \in \{s: |X_n(s)-c|<\delta\}$, then $s_0 \in \{s: |g(X_n(s))-g(c)| < \epsilon\}$. \pause This is the definition of containment\pause: \begin{eqnarray*} && \{s: |X_n(s)-c|<\delta\} \subseteq \{s: |g(X_n(s))-g(c)| < \epsilon\} \\ \pause & \Rightarrow & P(|X_n-c|<\delta) \leq P(|g(X_n)-g(c)| < \epsilon) \pause \leq 1 \\ \pause & \Rightarrow & \lim_{n \rightarrow \infty} P(|X_n-c|<\delta) \leq \lim_{n \rightarrow \infty}P(|g(X_n)-g(c)| < \epsilon) \leq 1 \\ \pause && \hspace{20mm} \equalto{}{\mbox{1}} \end{eqnarray*} \hspace{10mm} Squeeze $\blacksquare$ \end{columns} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Central Limit Theorem} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Convergence in distribution} \framesubtitle{Another mode of convergence} \pause Definition: Let the random variables $X_1, X_2 \ldots$ have cumulative distribution functions $F_{_{X_1}}(x), F_{_{X_2}}(x) \ldots$\pause, and let the random variable $X$ have cumulative distribution function $F_{_X}(x)$. \pause The (sequence of) random variable(s) $X_n$ is said to \emph{converge in distribution} to $X$ if \pause {\LARGE \begin{displaymath} \lim_{n \rightarrow \infty}F_{_{X_n}}(x) = F_{_X}(x) \end{displaymath} \pause \vspace{4mm} } % End size at every point where $F_{_X}(x)$ is continuous\pause, and we write $X_n \stackrel{d}{\rightarrow} X$. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Example: Convergence to a Bernoulli with $p=\frac{1}{2}$} \framesubtitle{$\lim_{n \rightarrow \infty}F_{_{X_n}}(x) = F_{_X}(x)$ at all continuity points of $F_{_X}(x)$} \pause \begin{displaymath} p_{_{X_n}}(x) = \left\{ \begin{array}{cl} % ll means left left 1/2 & \mbox{for } x=\frac{1}{n} \\ 1/2 & \mbox{for } x=1+\frac{1}{n} \\ 0 & \mbox{Otherwise} \end{array} \right. \end{displaymath} \vspace{3mm} \pause \begin{picture}(10,10)(0,-10) \put(15,-2){$n=1$} \put(50,0){\line(1,0){200} } \put(150,5){\line(0,-1){10} } \put(100,5){\line(0,-1){10} } \put(200,5){\line(0,-1){10} } \put(98,-15){0} \put(148,-15){1} \put(198,-15){2} \put(197.5,-2){$\bullet$} \put(147.5,-2){$\bullet$} \end{picture} \pause \begin{picture}(10,10)(0,10) \put(15,-2){$n=2$} \put(50,0){\line(1,0){200} } \put(150,5){\line(0,-1){10} } \put(100,5){\line(0,-1){10} } \put(200,5){\line(0,-1){10} } \put(98,-15){0} \put(148,-15){1} \put(198,-15){2} \put(172.5,-2){$\bullet$} \put(122.5,-2){$\bullet$} \end{picture} \pause \begin{picture}(10,10)(0,30) \put(15,-2){$n=3$} \put(50,0){\line(1,0){200} } \put(150,5){\line(0,-1){10} } \put(100,5){\line(0,-1){10} } \put(200,5){\line(0,-1){10} } \put(98,-15){0} \put(148,-15){1} \put(198,-15){2} \put(164.7,-2){$\bullet$} \put(114.7,-2){$\bullet$} \end{picture} \pause \vspace{15mm} \begin{itemize} \item For $x<0$, $\lim_{n \rightarrow \infty}F_{_{X_n}}(x)=$ \pause $0$ \pause \item For $01$, $\lim_{n \rightarrow \infty}F_{_{X_n}}(x)=$ \pause $1$ \pause \item What happens at $x=0$ and $x=1$ does not matter. \end{itemize} \end{frame} % A picture of the cdf would be really good. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Convergence to a constant} \pause %\framesubtitle{} {\small Consider a ``degenerate" random variable $X$ with $P(X=c)=1$. \pause \vspace{2mm} \begin{picture}(10,10) % (25,-25) % Line, direction (1,0), horizontal extent 200, starting point (50,0) \put(50,0){\line(1,0){200} } \put(150,5){\line(0,-1){10} } \put(148,-15){$c$} \put(100,-2){(} % Left parenthesis \put(200,-2){)} % Right parenthesis \put(90,-15){$c-\epsilon$} \put(190,-15){$c+\epsilon$} \end{picture} \pause \vspace{5mm} Suppose $X_n$ converges in probability to $c$. \pause \begin{itemize} \item Then for any $x>c$, $F_{_{X_n}}(x) \rightarrow 1$ for $\epsilon$ small enough. \pause \item And for any $xc$ and $F_{_{X_n}}(x) \rightarrow 0$ for all $x0$ be given. % If necessary make it smaller. \pause \begin{eqnarray*} P\{|X_n-c|<\epsilon\} & = & P\{ c-\epsilon < X_n < c+\epsilon \} \\ \pause & = & F_{_{X_n}}(c+\epsilon)-F_{_{X_n}}(c-\epsilon) \pause \mbox{ so} \\ \pause \lim_{n \rightarrow \infty}P\{|X_n-c|<\epsilon\} & = & \lim_{n \rightarrow \infty}F_{_{X_n}}(c+\epsilon) - \lim_{n \rightarrow \infty}F_{_{X_n}}(c-\epsilon) \\ \pause & = & 1-0 = 1 \end{eqnarray*} \pause And $X_n$ converges in probability to $c$. } % End size of whole slide. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Comment} %\framesubtitle{} \begin{itemize} \item Convergence in probability might seem redundant, because it's just convergence in distribution to a constant. \pause \item But that's only true when the convergence is to a constant. \pause \item Convergence in probability to a non-degenerate random variable \pause implies convergence in distribution. \pause \item But convergence in distribution does not imply convergence in probability when the convergence is to a non-degenerate variable. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Big Theorem about convergence in distribution} \framesubtitle{Theorem 4.4.2 in the text} \pause Let the random variables $X_1, X_2 \ldots$ have cumulative distribution functions $F_{_{X_1}}(x), F_{_{X_2}}(x) \ldots$ and moment-generating functions $M_{_{X_1}}(t), M_{_{X_2}}(t) \ldots$. \pause Let the random variable $X$ have cumulative distribution function $F_{_X}(x)$ and moment-generating function $M_{_X}(t)$. \pause If \begin{displaymath} \lim_{n \rightarrow \infty} M_{_{X_n}}(t) = M_{_X}(t) \end{displaymath} for all $t$ in an open interval containing $t=0$, \pause then $X_n$ converges in distribution to $X$. \pause \vspace{5mm} The idea is that convergence of moment-generating functions implies convergence of distribution functions. This makes sense because moment-generating functions and distribution functions are one-to-one. \end{frame} % _{_{X_1}} _{_{X_n}} _{_X} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Example: Poisson approximation to the binomial} \framesubtitle{We did this before with probability mass functions and it was a challenge.} \pause Let $X_n$ be a binomial ($n,p_n$) random variable with $p_n=\frac{\lambda}{n}$, so that $n \rightarrow \infty$ and $p \rightarrow 0$ in such a way that the value of $n \, p_n=\lambda$ remains fixed. Find the limiting distribution of $X_n$. \pause \vspace{1mm} Recalling that the MGF of a Poisson is $e^{\lambda(e^t-1)}$ and $\left(1 + \frac{x}{n}\right)^n \rightarrow e^x$, \pause \begin{eqnarray*} M_{_{X_n}}(t) & = & (\theta e^t+1-\theta )^n \\ \pause & = & \left(\frac{\lambda}{n}e^t+1-\frac{\lambda}{n} \right)^n \\ \pause & = & \left(1+\frac{\lambda(e^t-1)}{n} \right)^n \\ \pause & \rightarrow & e^{\lambda(e^t-1)} \\ \pause \end{eqnarray*} MGF of Poisson($\lambda$). \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The Central Limit Theorem} \framesubtitle{Proved using limiting moment-generating functions} \pause Let $X_1, \ldots, X_n$ be independent random variables from a distribution with expected value $\mu$ and variance $\sigma^2$. \pause Then \begin{displaymath} Z_n = \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma} \stackrel{d}{\rightarrow} Z \sim N(0,1) \end{displaymath} \pause In practice, $Z_n$ is often treated as standard normal for $n>25$\pause, although the $n$ required for an accurate approximation really depends on the distribution. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Sometimes we say the distribution of the sample mean is approximately normal, or ``asymptotically" normal.} \pause %\framesubtitle{} \begin{itemize} \item This is justified by the Central Limit Theorem. \pause \item But it does \emph{not} mean that $\overline{X}_n$ converges in distribution to a normal random variable. \pause \item The Law of Large Numbers says that $\overline{X}_n$ converges in probability to a constant, $\mu$. \pause \item So $\overline{X}_n$ converges to $\mu$ in distribution as well. \pause \item That is, $\overline{X}_n$ converges in distribution to a degenerate random variable with all its probability at $\mu$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Why would we say that for large $n$, the sample mean is approximately $N(\mu,\frac{\sigma^2}{n})$?} \pause \vspace{5mm} Have $Z_n = \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}$ \pause converging to $ Z \sim N(0,1)$. \pause {\footnotesize \begin{eqnarray*} Pr\{\overline{X}_n \leq x\} \pause & = & Pr\left\{ \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma} \leq \frac{\sqrt{n}(x-\mu)}{\sigma}\right\} \\ \pause & = & Pr\left\{ Z_n \leq \frac{\sqrt{n}(x-\mu)}{\sigma}\right\} \pause \approx \Phi\left( \frac{\sqrt{n}(x-\mu)}{\sigma} \right) \end{eqnarray*} } \pause Suppose $Y$ is \emph{exactly} $N(\mu,\frac{\sigma^2}{n})$: \pause {\footnotesize \begin{eqnarray*} Pr\{Y \leq x\} \pause & = & Pr\left\{ \frac{\sqrt{n}(Y-\mu)}{\sigma} \leq \frac{x-\mu}{\sigma/\sqrt{n}}\right\} \\ \pause & = & Pr\left\{ Z_n \leq \frac{\sqrt{n}(x-\mu)}{\sigma}\right\} \pause = \Phi\left( \frac{\sqrt{n}(x-\mu)}{\sigma} \right) \end{eqnarray*} } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. It is licensed under a \href{http://creativecommons.org/licenses/by-sa/3.0/deed.en_US} {Creative Commons Attribution - ShareAlike 3.0 Unported License}. Use any part of it as you like and share the result freely. 