An experiment was conducted to determine whether interviewers display bias against persons with disabilities. An actor recorded 5 different videos of a job interview, with identical scripts, except that in the first video, he appeared in a wheelchair; in the second video, he appeared on crutches; in the third video, he appeared hearing-impaired; in the fourth video, he appeared to have one leg amputated; and in the fifth video, he appeared to have no disabilties.
For each video, 14 different undergraduate students watched it, and rated the job applicant on his qualifications (70 students in total participated in the experiment, each providing one score/rating.) The scores are summarized in a boxplot below.
library(Sleuth3)
library(lattice)
disc <- case0601
disc$Handicap <- relevel(disc$Handicap, ref="None")
bwplot(Handicap ~ Score, data = case0601)
What are the conditions under which we can perform a One-Way ANOVA F-test? List all the conditions, state whether they seem to be satisfied for the discrimination experiment data based on the boxplot, and briefly state why. If you cannot determine whether a condition was satisfied, say so and briefly explain why.
Condition | is it satisfied? | Comment |
---|---|---|
The responses are normal | Yes, probably (arguable) | The boxes look symmetric, with most of the probability mass concentrated at the centre |
The variances are the same | Yes | The boxplots are all approximately the same size |
The measurements are independent | Can’t tell | The only way to really make sure the measurements are independent is to properly design the experiment |
Here is some R output for the discrimination experiment.
lm(Score~Handicap, data=disc)
##
## Call:
## lm(formula = Score ~ Handicap, data = disc)
##
## Coefficients:
## (Intercept) HandicapAmputee HandicapCrutches
## 4.9000 -0.4714 1.0214
## HandicapHearing HandicapWheelchair
## -0.8500 0.4429
Write down the formula for predicting the rating for a new video of an interview, where it is known what disability (if any) the actor in the video appears to have. Define all variables.
\(y_{new} = 4.9-0.47I_{amp}^{new}+1.02I_{Crutch}^{new}-0.85I_{Hear}^{new}+0.44I_{Wheel}^{new}\)
Definitions:
\(y_{new}\) is the predicted rating for the new datapoint \(I_{amp}^{new}\) is 1 if the person appears to be an amuptee and 0 otherwise \(I_{Crutch}^{new}\) is 1 if the person appears to have crutches and 0 otherwise \(I_{Hear}^{new}\) is 1 if the person appears to be hearing impaired and 0 otherwise \(I_{Wheel}^{new}\) is 1 if the person appears to be using a wheelchair and 0 otherwise
(There were no marks taken off for this, but it’s good to mention that \(I_{amp}^{new}+I_{Crutch}^{new}+I_{Hear}+I_{Wheel}\) should be either 1 or 0 for the model to be valid, since we haven’t observed what happens when people have multiple disabilities.)
What was the average rating for the videos where it appeared that the actor is an amputee? Show your work.
This is the same as the prediction. \(4.9-0.4714\) = 4.4283
Here is some R output for the discrimination experiment, with some of the values omitted
> anova(lm(Score~Handicap, data=disc))
Analysis of Variance Table
Response: Score
Df Sum Sq Mean Sq F value Pr(>F)
Handicap A 30.521 B C D *
Residuals E 173.321 G
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Compute all the omitted values. You may use functions such as qt
, qf
, pf
, pnorm
, etc. where necessary; if you use them, you don’t have to provide a numerical final answer. Briefly show how you obtained the answers.
A = Ngroups - 1 = 4
B = 30.521/A = 7.63025
E = Npoints-Ngroups = 65
G = 173.321/E = 2.6664769
C = B/G = 2.8615788
D = 1 - pf(C, df1=A, df2=B)
= pf(C, df1=A, df2=B, lower.tail=F)
(either solution is acceptable)
The full ANOVA table:
anova(lm(Score~Handicap, data=disc))
## Analysis of Variance Table
##
## Response: Score
## Df Sum Sq Mean Sq F value Pr(>F)
## Handicap 4 30.521 7.6304 2.8616 0.03013 *
## Residuals 65 173.321 2.6665
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
What conclusion can you draw from the F-test for which the p-value is computed in the ANOVA table? (Note the asterisk in the table, which indicates that the p-value was smaller than 0.05.) Be precise, and state the conclusion without referring to the model – just use English.
Conclusion: there is evidence that appearing to have (or to not have) different kind of disabilities makes a difference at least some of the time.
(Note: we accepted something like “The means are not all the same” as well.)
Suppose the experiment described in Question 1 is repeated. State one way in which the ANOVA assumptions may be violated for the data obtained, and provide a plausible scenario (i.e., a story about the actor, raters, etc.) which would lead to the violation of the ANOVA assumptions.
Lots of options here.
Option 1 (non-normality): One of the raters personally knew the actor and really disliked him. The rater gave the actor a rating of 0, which was highly unusual.
Option 2 (non-normality, but more interesting): Some of the raters were Arts students. Arts students tend to disagree with each other a lot about everything. They get the rating right on average, but the ratings Arts student give are distributed according to \(N(\mu, 5^2)\), where \(\mu\) is the correct rating. Other raters were Statistics students, who tend to be very precise. The ratings they give were distributed according to \(N(\mu, 0.5^2)\). The resulting distributions of grades were heavy tailed:
library(ggplot2)
stats_grades <- rnorm(1000, mean=60, sd=.5)
arts_grades <- rnorm(1000, mean=60, sd=5)
qplot(c(stats_grades, arts_grades))
qqnorm(c(stats_grades, arts_grades))
Option 3 (different variances): the actor appeared tired in one of the videos That made some raters sympathetic (resulting in higher scores) and some raters annoyed (resulting in lower scores.)
Option 4 (non-independence): the raters were all recruited on the same location on campus, and all belonged to the same program, where the grades are always high. The raters therefore also rated the interviewee highly.
The following is a description of a survey that was recently conducted by prvote.com
We used Google Consumer Surveys to poll 1500 respondents on one of three different phrasings of a possible referendum question, resulting in 500 responses to each question.
The results are as follows: