STA303/STA1002 Study Guide

Study Guide Problems

These problems are pretty close to what happenned in lecture. If they were to appear on the midterm exam, they would be usually more focused than what you see here. Of course, other problems might appear as well – you should go through the slides, and try to come up with your own problems.

Suppose we measure a quantitative variable for two populations. Outline a procedure to compare the two populations. What is the Null Hypothesis? What are the assumptions that need to be made in order to compare the two populations?
In the context of the Finches data, what would be the Null Hypothesis? What would be a Type I error? What would be a Type II error?
What is a p-value? What does a p-value of 0.04 mean in the context of the Finches study? Why is the answer “There is probability that the pre-draught and post-draught finches are the same is 4%” not the correct answer to the previous question?
Show how to compute a two-sided, and one-sided p-value for a normal distribution
Give an example of a study (make up a study) where the p-value is low, but that should not be reasonably interpreted as evidence against the null hypothesis.
Explain what the “sampling distribution of the sample mean” means.
Write R code to construct a 90% CI for the mean of a population which is assumed to be normally distributed with variance 1. Now write R code to construct a different 90% CI.
Why use the pooled variance rather than the variance from one of the samples?
Which is larger, a one-sided p-value, or a two-sided p-value? Why?
What assumption allows us to compute and use the pooled variance
Suppose x contains 10 measurements. Assume that the measurements are normally distributed. Only using the function \(\verb;qt;\), write R code to perform a t-test with the Null Hypothesis that the mean of x is equal to 15.
What does it mean for a statistical procedure to be robust?
Suppose x contains 10 measurements. Assume that the measurements are normally distributed. Write R code to perform a test with the Null Hypothesis that the standard deviation of x is equal to 3.
Suppose you fir the regression model \[Y\sim a_0+a_{g1}I_{g1}+a_{g2}I_{g2}+...+a_{gk}I_{gk}+N(0, \sigma^2)\] Here, \(I_{gi}\) is an indicator variable for group i. What is the interpretation of \(a_0\)? What is the interpretation of \(a_{g1}\)?
What is the equation that \(a_0\)…\(a_{gk}\) satisfy?
Explain how to use the F-test in the context of comparing group means.
In terms of the coefficients in Q14, what is the Null-Hypothesis that is being tested using the F-test?
Only using rnorm(), write R code to generate a sample from F(3, 5).
Only using runif(), write R code to generate a sample that is approximately from F(3, 5) (hint: use the CLT)
Explain the F-statistic in ANOVA in terms of within-group and between-group variance. Intuitively, what does it mean if the between-group variance is the same as the within-group variance?
If we are computing 20 p-values and the Null Hypothesis is true, what percent of the time will we get at least one false positive?
Explain how to apply the Bonferroni correction. Explain why when the Bonferroni correction is applied using R’s t.test, most of the values in the table are 1
What is the problem with performing multiple comparisons without adjustment? Why does the Bonferroni adjustment address the issue?
Why is \(mean((X_i-\bar{X})^2)\) smaller than (or equal to) \(mean((X_i-\mu)^2)\)? Explain intuitively and also give a mathematical proof.
What is the difference between One-Way ANOVA and Two-Way ANOVA?
Explain why the Partial F-test works (i.e., why the numerator in the F-statistic has the number of degrees of freedom as the number of extra parameters in the full model)
Why do we need the equal-variance assumption when using ANOVA?
Sketch the interaction plots when (a) two variables do not interact (b) an interaction is present. Construct scenarios (i.e., stories about datasets) that correspond to your plots.
Suppose \(Y_i \sim Bernoulli(\pi)\) and you have N independent samples. What is the MLE? (Show how to obtain it.) Why is there a problem with that method if the samples are not independent?
Use the CLT to obtain a normal approximation for the sample mean of independent samples from a Benroulli distribution
Why can’t we apply linear regression when our measurements are distributed according to \(Y_i \sim Benroulli(\pi_i)\)
Sketch the logistic function
Provide an example of converting odds to probabilities and vice versa
As of right now, the odds for the Presidential election in the USA are http://www.paddypower.com/bet/politics/other-politics/us-politics?ev_oc_grp_ids=791149:

Candidate	Odds
Clinton	1/3
Trump	11/4
Biden	25/1
Sanders	25/1
Ryan	100/1
Johnson	100/1

Assuming these are fair odds, what is the estimate of the probability that Biden will win the election?

If the true probability that Biden will win the election is 10%, what is the expected loss for paddypower per bet placed on Biden?

Estimate how much money, on average, Paddy Power is making from bets.

For the model \[\log(\frac{\pi}{1-\pi})=\beta_0+\beta_1 x_1 + ....\]

What is the interpretation of \(\beta_0\). What is the interperation of \(\beta_1\) (both when \(x_1\) is an indicator, and when it’s quantitative)

Explain the difference between the Coffee example from lecture and logistic regression. Suppose we have data about blend preferences, and also data about what colour of a cup was used for each blend. Explain how it would be (perhaps) possible to obtain a higher likelihood when using logistic regression (as opposed to fitting the \(Bernoulli(\pi)\) model to the data like in the Coffee example from lecture.)
Work out an example in R where you’re performing a Wald test.
What are the model assumptions in logistic regression?
Explain how the adjusted p-values are computed when using the Bonferroni adjustment for multiple comparisons.
What can you say about how the frequency of Type I errors when using the Bonferroni correction?
When looking at the p-values obtained using the Bonferroni correction, you see a lot of 1.0’s. Why are so many p-values equal to each other?
In what situations is it appropriate to use linear regression with a fixed intercept? Is it ever appropriate when the predictor variables are categorical?
What is the relationship between Deviance and likelihood?
What is the relationmship between Type I/II errors and True Positives/False Negatives?
Give an example of how cross-validation works. Explain the two cost functions that we used in R:
```
cost_classification <- function(r, pi) mean(abs(r-pi) > 0.5)
cost_negloglikelihood <- function(r, pi) -sum(r*log(pi)+(1-r)*log(1-pi)   )
```
Explain how those are used in the context of cross-validation.
Explain perfecrt separation. Give examples of small datasets (you can do that graphically) where there is and isn’t perfect separation
Show that the sum of the squared deviance risduals is the deviance.
What causes extra-binomial variation? Generate a dataset that exhibits extra-binomial variation using R
If the overdispersion parameter is greater than 1, does that increase or decrease the probability of rejecting hypotheses about the model parameters?
What is the difference between Poisson and Binomial distributions? Give an example of Poisson-disttributed data, and explain why it shouldn’t be modelled using the Binomial distribution, and vice-versa.
Write code to generate a Binomial random variable with N trials and probability of success \(\theta\) using, rbinom only with the parameters rbinom(1, 1, theta).
The CLT implies that for large n and large \(\lambda\) respectively, the Poisson and Binomial distirbution look like a normal distribution. Give an example of \(n\), \(\lambda\), and \(\theta\) such that \(Poisson(\lambda)\) and \(Binomial(n, \theta)\) are very similar.
Why do we plot the the logits of the response proportions vs. the predictor variable?
Give three examples of GLMs, with different combinations of link functions and distributions. Write down the likelihood functions for the data each time. Specify the link function and probability distirbution each time.
For the GLM with the Gaussian distribution with the identity link function, show that the betas obtained are the same as would be obtained when running linear regression
Explain how overdispersion can occur in logistic regression. Generate data where overdispersion would be observed.
How is the deviance distributed if the model is correct? Why is the Chi-Squared one-sided when performing a goodness-of-fit test in GLMs?
If the Poisson Regression model is appropriate, describe how the residuals are expected to be distributed. (Note: the answer is not that the residuals are Poisson distributed)
Give a scenario where the log-link function would be appropriate
When would you use complete pooling, partial-pooling, and no-pooling? In other words, what assumptions about the data would justify each?
Give an example of a dataset where it would be appropriate to model slopes as a random effect.
What’s the difference between logistic regression and ridge logistic regression?
Why is the visualization of the coefficients looks less like a face when we do not use ridge logistic regression?
Explain how no-pooling estimates can “overfit”
Explain how why partially-pooled estimates are useful when using poststratification

STA303/STA1002 Study Guide

Aug. 5, 2016

Study Guide Problems