knitr::opts_chunk$set(echo = TRUE)
require(Sleuth3)
require(ggplot2)
require(tigerstats)
require(xtable)
require(lattice)
pyg = case1302
First, let’s visualize the data. We need to use aggregate
since we have two control observations for each company
pyg_mean <- aggregate(pyg$Score, by=list(pyg$Company, pyg$Treat), mean)
#pyg_mean <- pyg_mean[order(pyg_mean$Company),]
colnames(pyg_mean) <- colnames(pyg)
barchart(Score~Company, groups=Treat, data=pyg_mean, auto.key=list(columns=2))
This presents an exaggerated picture of what’s actually going on… Let’s make the origin be 0 in that barchart
barchart(Score~Company, groups=Treat, data=pyg_mean, auto.key=list(columns=2), origin=0)
Another way to visualize the data is using interaction plots.
with(pyg, interaction.plot(x.factor=Company, trace.factor=Treat, Score, col=c("red", "blue")))
with(pyg, interaction.plot(x.factor=Treat, trace.factor=Company, Score, col=c("red", "blue")))
It looks like for most companies, there is a roughly constant Pygmalion effect.
Let’s now run the regression for model with the interactions
fit_saturated <- lm(Score~Treat+Company+Treat*Company, data=pyg)
summary(fit_saturated)
##
## Call:
## lm(formula = Score ~ Treat + Company + Treat * Company, data = pyg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2 -2.3 0.0 2.3 9.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.200 5.094 12.996 3.89e-07 ***
## TreatPygmalion 13.800 8.823 1.564 0.1522
## CompanyC10 4.500 7.204 0.625 0.5477
## CompanyC2 6.100 7.204 0.847 0.4191
## CompanyC3 10.000 8.823 1.133 0.2863
## CompanyC4 0.300 7.204 0.042 0.9677
## CompanyC5 10.000 7.204 1.388 0.1985
## CompanyC6 15.600 7.204 2.166 0.0585 .
## CompanyC7 -1.100 7.204 -0.153 0.8820
## CompanyC8 4.300 7.204 0.597 0.5653
## CompanyC9 6.900 7.204 0.958 0.3632
## TreatPygmalion:CompanyC10 -0.800 12.477 -0.064 0.9503
## TreatPygmalion:CompanyC2 -2.200 12.477 -0.176 0.8639
## TreatPygmalion:CompanyC3 -21.800 13.477 -1.618 0.1402
## TreatPygmalion:CompanyC4 -3.800 12.477 -0.305 0.7676
## TreatPygmalion:CompanyC5 -2.200 12.477 -0.176 0.8639
## TreatPygmalion:CompanyC6 -5.800 12.477 -0.465 0.6531
## TreatPygmalion:CompanyC7 -2.800 12.477 -0.224 0.8275
## TreatPygmalion:CompanyC8 -12.800 12.477 -1.026 0.3317
## TreatPygmalion:CompanyC9 -17.400 12.477 -1.395 0.1966
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.204 on 9 degrees of freedom
## Multiple R-squared: 0.7388, Adjusted R-squared: 0.1875
## F-statistic: 1.34 on 19 and 9 DF, p-value: 0.3358
Let’s look at the residual plot:
plot(fit_saturated,1)
It looks like the variance is not really constant, with larger residuals for smaller scores. (Why would that be?)
It’s hard to come up with a data transformation that would fix that, however. What would a log transformation working imply?
Let’s now perform a partial F-test:
anova(fit_saturated)
## Analysis of Variance Table
##
## Response: Score
## Df Sum Sq Mean Sq F value Pr(>F)
## Treat 1 327.34 327.34 6.3080 0.03323 *
## Company 9 682.52 75.84 1.4614 0.29051
## Treat:Company 9 311.46 34.61 0.6669 0.72212
## Residuals 9 467.04 51.89
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit_reduced <- lm(Score~Treat+Company, data=pyg)
summary(fit_reduced)
##
## Call:
## lm(formula = Score ~ Treat + Company, data = pyg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.660 -4.147 1.853 3.853 7.740
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 68.39316 3.89308 17.568 8.92e-13 ***
## TreatPygmalion 7.22051 2.57951 2.799 0.0119 *
## CompanyC10 4.23333 5.36968 0.788 0.4407
## CompanyC2 5.36667 5.36968 0.999 0.3308
## CompanyC3 0.19658 6.01886 0.033 0.9743
## CompanyC4 -0.96667 5.36968 -0.180 0.8591
## CompanyC5 9.26667 5.36968 1.726 0.1015
## CompanyC6 13.66667 5.36968 2.545 0.0203 *
## CompanyC7 -2.03333 5.36968 -0.379 0.7094
## CompanyC8 0.03333 5.36968 0.006 0.9951
## CompanyC9 1.10000 5.36968 0.205 0.8400
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.576 on 18 degrees of freedom
## Multiple R-squared: 0.5647, Adjusted R-squared: 0.3228
## F-statistic: 2.335 on 10 and 18 DF, p-value: 0.0564
anova(fit_reduced)
## Analysis of Variance Table
##
## Response: Score
## Df Sum Sq Mean Sq F value Pr(>F)
## Treat 1 327.34 327.34 7.5685 0.01314 *
## Company 9 682.52 75.84 1.7534 0.14844
## Residuals 18 778.50 43.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit_reduced, 1)
Note that the estimate for the effect is quite different from the full fit. Deciding which estimate to use has to do with whether we have a prior reason to believe in there being interactions. In this case, we probably don’t, and there was no evidence in the data that there are interactions, so it’s reasonable to leave them out.
Visualize the Pygmalion effect using boxplots:
fit1 <- lm(Score ~ Company, data= pyg)
pyg$resids <- resid(fit1)
ggplot(pyg, aes(x= Treat, y= resids)) + geom_boxplot()