% Machine Learning Basics for Applied Stat I % \documentclass[serif]{beamer} % Serif for Computer Modern math font. \documentclass[serif, handout]{beamer} % Handout mode to ignore pause statements \hypersetup{colorlinks,linkcolor=,urlcolor=red} \usefonttheme{serif} % Looks like Computer Modern for non-math text -- nice! \setbeamertemplate{navigation symbols}{} % Suppress navigation symbols % \usetheme{Berlin} % Displays sections on top \usetheme{Frankfurt} % Displays section titles on top: Fairly thin but still swallows some material at bottom of crowded slides %\usetheme{Berkeley} \usepackage[english]{babel} \usepackage{amsmath} % for binom \usepackage{euscript} % for \EuScript % \usepackage{graphicx} % To include pdf files! % \definecolor{links}{HTML}{2A1B81} % \definecolor{links}{red} \setbeamertemplate{footline}[frame number] \mode % \mode{\setbeamercolor{background canvas}{bg=black!5}} % Comment this out for handout \title{Machine Learning Basics\footnote{See last slide for copyright information.}} \subtitle{STA442/2101 Fall 2018} \date{} % To suppress date \begin{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \titlepage \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Source} %\framesubtitle{Lots of copy-paste quotes} \begin{center} Chapter 5 in \emph{Deep Learning} by Goodfellow, Bengio and Courville \end{center} \pause \vspace{10mm} I have copy-pasted so many quotes from this text that this slide show fits the definition of plagiarism. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Machine learning is a form of applied statistics} \pause %\framesubtitle{} \begin{quote} Machine learning is essentially a form of applied statistics with increased emphasis on the use of computers to statistically estimate complicated functions and a decreased emphasis on proving confidence intervals around these functions. \end{quote} % p. 98 \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Learning} \pause %\framesubtitle{} \begin{quote} A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. \end{quote} \pause The term learning can also mean model fitting or parameter estimation. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Tasks} \pause %\framesubtitle{} \begin{quote} Machine learning tasks are usually described in terms of how the machine learning system should process an example. An \textbf{example} is a collection of \textbf{features} that have been quantitatively measured from some object or event that we want the machine learning system to process. \end{quote} \pause \vspace{10mm} \begin{itemize} \item Example = data\pause usually what we would call a case \pause \item Feature = variable \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Examples of common tasks T} %\framesubtitle{} \begin{itemize} \item Classification: In this type of task, the computer program is asked to specify which of $k$ categories some input belongs to. \pause Can be withor without missing inputs (medical diagnosis). \item Regression: In this type of task, the computer program is asked to predict a numerical value given some input. \pause \item Transcription: In this type of task, the machine learning system is asked to observe a relatively unstructured representation of some kind of data and transcribe it into discrete, textual form. For example, in optical character recognition, \ldots \pause \item Machine translation: In a machine translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{More Examples of tasks} %\framesubtitle{} \begin{itemize} \item Anomaly detection: In this type of task, the computer program sifts through a set of events or objects, and flags some of them as being unusual or atypical. An example of an anomaly detection task is credit card fraud detection. \pause \emph{To me, detection of credit card fraud is a classification problem. \pause Another example of anomoly detection is outlier detection.} \pause \item Imputation of missing values. \pause \item Denoising: The machine learning algorithm is given as input a corrupted example obtained by an unknown corruption process from a clean example. The learner must predict the clean example from its corrupted version. \pause \emph{Sounds like measurement error modeling.} \pause \item Density estimation or probability mass function estimation. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Performance measure P} %\framesubtitle{} \begin{itemize} \item Performance measures are usually specific to the task. \pause Accuracy is an example. \pause \item ``We often refer to the error rate as the expected 0-1 loss. The 0-1 loss on a particular example is 0 if it is correctly classified and 1 if it is not." \pause \end{itemize} \begin{quote} Usually we are interested in how well the machine learning algorithm performs on data that it has not seen before, since this determines how well it will work when deployed in the real world. We therefore evaluate these performance measures using a \textbf{test} set of data that is separate from the data used for training the machine learning system. \end{quote} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Experience E} \pause \framesubtitle{} \begin{quote} Machine learning algorithms can be broadly categorized as unsupervised or supervised by what kind of experience they are allowed to have during the learning process. \pause Most of the learning algorithms in this book can be understood as being allowed to experience an entire dataset. \end{quote} \pause Experience = to process data? \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Supervised versus unsupervised learning} \pause %\framesubtitle{} \begin{quote} Roughly speaking, unsupervised learning involves observing several examples of a random vector $x$, and attempting to implicitly or explicitly learn the probability distribution $p(x)$, \pause or some interesting properties of that distribution, \pause while supervised learning involves observing several examples of a random vector $x$ and an associated value or vector $y$, \pause and learning to predict $y$ from $x$, \pause usually by estimating $p(y|x)$. \end{quote} \pause \begin{itemize} \item Supervised learning: There is a response variable. \pause (Called a ``label" or ``target.") \pause \item Unsupervised learning: No response variable. \pause \begin{itemize} \item Cluster analysis \item Principal components \item Density estimation \end{itemize} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Design matrix} %\framesubtitle{} \begin{quote} A design matrix is a matrix containing a different example in each row. \pause Each column of the matrix corresponds to a different feature. \pause For instance, the Iris dataset contains 150 examples with four features for each example. \end{quote} \begin{itemize} \item Example = case (there are $n$ cases). \pause \item Feature = variable. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Regression example} \framesubtitle{``A simple machine learning algorithm: linear regression."} \pause {\small \begin{itemize} \item ``The goal is to build a system that can take a vector $\mathbf{x} \in \mathbb{R}^n$ as input and predict the value of a scalar $y \in \mathbb{R}$ as its output." \pause \item ``We define the output to be $\widehat{y} = \mathbf{w}^\top\mathbf{x}$, where $\mathbf{w} \in \mathbb{R}^n$ is a vector of \textbf{parameters}." \pause \emph{ouch.} \pause \item Test set will be used only for evaluation. \pause \emph{Good.} \pause \item ``One way of measuring the performance of the model is to compute the mean squared error of the model on the test set." $MSE_{\mbox{test}}$ \pause \item ``Minimize the mean squared error on the training set, $MSE_{\mbox{train}}$ ." \pause \item Then they present the normal equations \pause and say that evaluating $\mathbf{w} = \left(\mathbf{X}^{\mbox{(train)}\top} \mathbf{X}^{\mbox{(train)}}\right)^{-1} \mathbf{X}^{\mbox{(train)}\top} \mathbf{y}^{\mbox{(train)}}$ \pause ``constitutes a simple learning algorithm." \pause So ``learning" is definitely estimation, or at least curve fitting. \end{itemize} } % End size \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{What is a ``model?"} \pause %\framesubtitle{} \begin{itemize} \item I know what a model is in statistics. \pause It's a set of assertions that implies a probability distribution for the observable data. \pause \item In machine learning, the meaning is slippery -- close but not quite the same. \pause \item The ``system that can take a vector $\mathbf{x} \in \mathbb{R}^n$ as input and predict the value of a scalar $y \in \mathbb{R}$ as its output" would probably be called a model. \pause \item In statistics, this would be a combination of a model and an estimator. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Generalization} \pause %\framesubtitle{} \begin{itemize} \item Generalization: The ability to perform well on previously unobserved inputs. \pause \item With access to a training set, we can compute some error measure on the training set called the \textbf{training error}\pause, and we reduce this training error. \pause \emph{This is just optimization.} \pause \item The generalization error is defined as the expected value of the error on a new input. \pause \emph{Good, that's clear.} \pause \item We want the generalization error, also called the \textbf{test error}, to be low as well. \pause \emph{That is, we want generalization error as well as training error to be low}. \pause \item There's a theorem that generalization error is greater than or equal to (expected) training error \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Objectives} %\framesubtitle{To achieve good performance} \begin{itemize} \item Make the training error small. \item Make the gap between training and test error small. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Data generating distribution} \framesubtitle{The i.i.d.~assumptions} To get anywhere, even the machine learning people have to make some assumptions.\pause \begin{itemize} \item Assume training set and test set are independent.\pause \item Examples are independent within sets.\pause \item Both training set and test set come from a common \textbf{data generating distribution distribution} denoted $p_{\mbox{data}}$. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Over and underfittng}\pause %\framesubtitle{} \begin{itemize} \item Underfitting occurs when the model is not able to obtain a sufficiently low error value on the training set.\pause \item Overfitting occurs when the gap between the training error and test error is too large.\pause \item \emph{Notice how empirical this is compared to the statistical formulation.} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Model capacity} %\framesubtitle{} \begin{itemize} \item A model's \textbf{capacity} is its ability to fit a wide variety of functions.\pause \item ``Models with low capacity may struggle to fit the training set."\pause \item ``Models with high capacity can overfit by memorizing properties of the training set that do not serve them well on the test set."\pause \item \textbf{Hypothesis space}: The set of functions that the learning algorithm is allowed to select as being the solution. \pause \emph{Polynomial regression example}.\pause \item ``The model specifies which family of functions the learning algorithm can choose from when varying the parameters in order to reduce a training objective.\pause This is called the \textbf{representational capacity} of the model."\pause \item \emph{So the representational capacity appears to be determined by the hypothesis space.} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{There are theoretical results} \pause %\framesubtitle{} \begin{quote} The most important results in statistical learning theory show that the discrepancy between training error and generalization error is bounded from above by a quantity that grows as the model capacity grows but shrinks as the number of training examples increases. \end{quote}\pause The authors note that these bounds are very loose and seldom used in practice. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{$k$ nearest neighbor regression}\pause %\framesubtitle{} \begin{itemize} \item Capacity of the model grows with the size of the data set.\pause \item ``This algorithm is able to achieve the minimum possible training error on any regression dataset." \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Bayes error}\pause %\framesubtitle{} \begin{itemize} \item The ideal model is an oracle that simply knows the true probability distribution that generates the data.\pause \item The error incurred by an oracle making predictions from the true distribution $p(\mathbf{x}, y)$ is called the Bayes error.\pause \item \emph{I have no idea why.} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{The No Free Lunch Theorem}\pause %\framesubtitle{} \begin{quote} The no free lunch theorem for machine learning (Wolpert, 1996) states that, averaged over all possible data generating distributions, every classification algorithm has the same error rate when classifying previously unobserved points. \end{quote}\pause So you need to make some assumptions about the probability distributions that might reasonably be encountered in practice. \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Regularization}\pause %\framesubtitle{} \begin{itemize} \item The no free lunch theorem implies that we must design our machine learning algorithms to perform well on a specific task.\pause \item Choose functions in the hypothesis space well.\pause \item We can also give a learning algorithm a preference for one solution in its hypothesis space over another.\pause \item This is called \textbf{regularization}.\pause \item Weight decay example in regression: Minimize \begin{displaymath} J(\mathbf{w}) = MSE_{\mbox{train}} + \lambda \mathbf{w}^\top \mathbf{w} \end{displaymath}\pause \item \emph{I don't know about the crazy vocabulary, but there is some freedom here that is harder to find in standard statistical practice.} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Hyperparameters}\pause %\framesubtitle{} \begin{itemize} \item Most machine learning algorithms have several settings that we can use to control the behavior of the learning algorithm.\pause \item These settings are called hyperparameters.\pause \item The values of hyperparameters are not adapted by the learning algorithm itself.\pause \item \emph{This is very different from the meaning of hyperparameter as I understand it.} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Validation set} \framesubtitle{A useful concept} \pause \begin{itemize} \item \emph{Want to search around in the hypothesis space to locate the best model, but that could result in over-fitting.}\pause \item \emph{Can't peek at the test data.}\pause \item Split the training data, \emph{yes the training data}, into two disjoint subsets.\pause \item One of these subsets is used to learn the parameters.\pause \item The other subset is our validation set,\pause ~used to estimate the generalization error during or after training,\pause ~allowing for the hyperparameters to be updated accordingly.\pause \item The subset of data used to learn the parameters is still typically called the training set,\pause ~even though this may be confused with the larger pool of data used for the entire training process. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{$k$-fold cross validation}\pause %\framesubtitle{} \begin{itemize} \item You don't have enough data to split into a training set and a test set.\pause \item Split into $k$ disjoint subsets. Try to predict each one in turn using the rest of the data as a training sample.\pause \item Average the results.\pause \item The variance of such an average is hard to estimate \pause well. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Standard statistical ideas} \framesubtitle{Sometimes with a strange twist} \pause \begin{itemize} \item Sometimes there really are unknown parameters and they do maximum likelihood.\pause \item They still refer to estimation as prediction, most of the time.\pause \item ``In machine learning experiments, it is common to say that algorithm A is better than algorithm B if the upper bound of the 95\% confidence interval for the error of algorithm A is less than the lower bound of the 95\% confidence interval for the error of algorithm B."\pause \item \emph{Why not just test?} \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Bayesian Statistics} %\framesubtitle{} \begin{itemize} \item Pretty standard treatment.\pause \item Predictive density: Integrate out the parameter using the posterior distribution.\pause \item Regression example with a conjugate prior: ``The Bayesian estimate provides a covariance matrix, showing how likely all the different values of $\mathbf{w}$ are, rather than providing only a the estimate \ldots"\pause \item ``Maximum A Posteriori (MAP) Estimation" means estimate using the posterior mode. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Support vector machines}\pause %\framesubtitle{} \begin{itemize} \item Method for binary classification.\pause \item It looks pretty strong. This is new to me,\pause ~I think.\pause \item Predict Yes for test data $\mathbf{x}$ when $\mathbf{w}^\top\mathbf{x} + b$ is positive.\pause \item Predict No when $\mathbf{w}^\top\mathbf{x} + b$ is negative.\pause \item Replace $\mathbf{w}^\top\mathbf{x}$ with $b + \sum_{i=1}^m \alpha_i \phi(\mathbf{x}) \cdot \phi(\mathbf{x}^{(i)})$,\pause \item Where $\mathbf{x}^{(i)}$ is a vector of training data and the dot product is very general. \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Stochastic Gradient Descent}\pause %\framesubtitle{} \begin{itemize} \item A recurring problem in machine learning is that large training sets are necessary for good generalization, but large training sets are also more computationally expensive.\pause \item The cost function used by a machine learning algorithm often decomposes as a sum over training examples of some per-example loss function.\pause \emph{The minus log likelihood is a sum over observations}.\pause \item The sample size can be huge.\pause \item Calculating the big sum (of derivatives) can take a lot of computation. \pause \item So just compute the gradient on a random sample. Random = stochastic.\pause \item Go downhill, a little randomly.\pause \item It's no big deal, but why not just do the whole task on a random sample of the data? \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Recipe for a machine learning algorithm}\pause %\framesubtitle{} Combine \begin{itemize} \item Dataset \item Cost function \item Optimization procedure \item Model \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{Copyright Information} This slide show was prepared by \href{http://www.utstat.toronto.edu/~brunner}{Jerry Brunner}, Department of Statistical Sciences, University of Toronto. So many quotes are lifted from \emph{Deep Learning} by Goodfellow et al.~that this document fits the definition of plagiarism. \LaTeX~source code is available from the course website: \href{http://www.utstat.toronto.edu/~brunner/oldclass/appliedf18} {\small\texttt{http://www.utstat.toronto.edu/$^\sim$brunner/oldclass/appliedf18}} \end{frame} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} %\framesubtitle{} \begin{itemize} \item \item \item \end{itemize} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{frame} \frametitle{} %\framesubtitle{} \begin{quote} \end{quote} \end{frame} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% {\LARGE \begin{displaymath} \end{displaymath} } \begin{displaymath} \left( \begin{array}{ccc} \sum_{i=1}^n (x_{i1}-\overline{x}_1)^2 & \sum_{i=1}^n (x_{i1}-\overline{x}_1)(x_{i2}-\overline{x}_2) & \sum_{i=1}^n (x_{i1}-\overline{x}_1)(x_{i3}-\overline{x}_3)\\ \end{array} \right) \end{displaymath} =============== =============== =============== ===============