knitr::opts_chunk$set(echo = TRUE)
require(Sleuth3)
require(ggplot2)
require(tigerstats)
require(xtable)
require(lattice)

pyg = case1302

First, let’s visualize the data. We need to use aggregate since we have two control observations for each company

pyg_mean <- aggregate(pyg$Score, by=list(pyg$Company, pyg$Treat), mean)
#pyg_mean <- pyg_mean[order(pyg_mean$Company),]
colnames(pyg_mean) <- colnames(pyg)
barchart(Score~Company, groups=Treat, data=pyg_mean, auto.key=list(columns=2))

This presents an exaggerated picture of what’s actually going on… Let’s make the origin be 0 in that barchart

barchart(Score~Company, groups=Treat, data=pyg_mean, auto.key=list(columns=2), origin=0)

Another way to visualize the data is using interaction plots.

with(pyg, interaction.plot(x.factor=Company, trace.factor=Treat, Score, col=c("red", "blue")))

with(pyg, interaction.plot(x.factor=Treat, trace.factor=Company, Score, col=c("red", "blue")))

It looks like for most companies, there is a roughly constant Pygmalion effect.

Let’s now run the regression for model with the interactions

fit_saturated <- lm(Score~Treat+Company+Treat*Company, data=pyg)
summary(fit_saturated)
## 
## Call:
## lm(formula = Score ~ Treat + Company + Treat * Company, data = pyg)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -9.2   -2.3    0.0    2.3    9.2 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 66.200      5.094  12.996 3.89e-07 ***
## TreatPygmalion              13.800      8.823   1.564   0.1522    
## CompanyC10                   4.500      7.204   0.625   0.5477    
## CompanyC2                    6.100      7.204   0.847   0.4191    
## CompanyC3                   10.000      8.823   1.133   0.2863    
## CompanyC4                    0.300      7.204   0.042   0.9677    
## CompanyC5                   10.000      7.204   1.388   0.1985    
## CompanyC6                   15.600      7.204   2.166   0.0585 .  
## CompanyC7                   -1.100      7.204  -0.153   0.8820    
## CompanyC8                    4.300      7.204   0.597   0.5653    
## CompanyC9                    6.900      7.204   0.958   0.3632    
## TreatPygmalion:CompanyC10   -0.800     12.477  -0.064   0.9503    
## TreatPygmalion:CompanyC2    -2.200     12.477  -0.176   0.8639    
## TreatPygmalion:CompanyC3   -21.800     13.477  -1.618   0.1402    
## TreatPygmalion:CompanyC4    -3.800     12.477  -0.305   0.7676    
## TreatPygmalion:CompanyC5    -2.200     12.477  -0.176   0.8639    
## TreatPygmalion:CompanyC6    -5.800     12.477  -0.465   0.6531    
## TreatPygmalion:CompanyC7    -2.800     12.477  -0.224   0.8275    
## TreatPygmalion:CompanyC8   -12.800     12.477  -1.026   0.3317    
## TreatPygmalion:CompanyC9   -17.400     12.477  -1.395   0.1966    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.204 on 9 degrees of freedom
## Multiple R-squared:  0.7388, Adjusted R-squared:  0.1875 
## F-statistic:  1.34 on 19 and 9 DF,  p-value: 0.3358

Let’s look at the residual plot:

plot(fit_saturated,1)

It looks like the variance is not really constant, with larger residuals for smaller scores. (Why would that be?)

It’s hard to come up with a data transformation that would fix that, however. What would a log transformation working imply?

Let’s now perform a partial F-test:

anova(fit_saturated)
## Analysis of Variance Table
## 
## Response: Score
##               Df Sum Sq Mean Sq F value  Pr(>F)  
## Treat          1 327.34  327.34  6.3080 0.03323 *
## Company        9 682.52   75.84  1.4614 0.29051  
## Treat:Company  9 311.46   34.61  0.6669 0.72212  
## Residuals      9 467.04   51.89                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fit_reduced <- lm(Score~Treat+Company, data=pyg)
summary(fit_reduced)
## 
## Call:
## lm(formula = Score ~ Treat + Company, data = pyg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.660  -4.147   1.853   3.853   7.740 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    68.39316    3.89308  17.568 8.92e-13 ***
## TreatPygmalion  7.22051    2.57951   2.799   0.0119 *  
## CompanyC10      4.23333    5.36968   0.788   0.4407    
## CompanyC2       5.36667    5.36968   0.999   0.3308    
## CompanyC3       0.19658    6.01886   0.033   0.9743    
## CompanyC4      -0.96667    5.36968  -0.180   0.8591    
## CompanyC5       9.26667    5.36968   1.726   0.1015    
## CompanyC6      13.66667    5.36968   2.545   0.0203 *  
## CompanyC7      -2.03333    5.36968  -0.379   0.7094    
## CompanyC8       0.03333    5.36968   0.006   0.9951    
## CompanyC9       1.10000    5.36968   0.205   0.8400    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.576 on 18 degrees of freedom
## Multiple R-squared:  0.5647, Adjusted R-squared:  0.3228 
## F-statistic: 2.335 on 10 and 18 DF,  p-value: 0.0564
anova(fit_reduced)
## Analysis of Variance Table
## 
## Response: Score
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## Treat      1 327.34  327.34  7.5685 0.01314 *
## Company    9 682.52   75.84  1.7534 0.14844  
## Residuals 18 778.50   43.25                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(fit_reduced, 1)

Note that the estimate for the effect is quite different from the full fit. Deciding which estimate to use has to do with whether we have a prior reason to believe in there being interactions. In this case, we probably don’t, and there was no evidence in the data that there are interactions, so it’s reasonable to leave them out.

Visualize the Pygmalion effect using boxplots:

fit1 <- lm(Score ~ Company, data= pyg)
pyg$resids <- resid(fit1)
ggplot(pyg, aes(x= Treat, y= resids)) + geom_boxplot()