Here is a model solution to Assignment #2 (Postscript, PDF), for one particular set of data (and here is another). This solution continues with the analysis of this data set from where the solution to Assignment #1 left off. The analysis below assumes that the data has been cleaned up as in Assignment #1, and that new Minitab columns have been created as described there. Some new columns were created for this assignment. You can view the final version of the worksheet with all the new columns in Postscript or PDF.
The discussion below is interspersed with portions of Minitab output. There are also links to a few plots, in Postscript or PDF formats. The Postscript is best viewed on CQUEST by choosing "0.707" from the "1.000" menu.
MTB > twot 'ill' 'test1' Two Sample T-Test and Confidence Interval Two sample T for ill test1 N Mean StDev SE Mean 0 8 4.87 3.40 1.2 1 8 3.12 1.46 0.52 95% CI for mu (0) - mu (1): ( -1.2, 4.71) T-Test mu (0) = mu (1) (vs not =): T = 1.34 P = 0.21 DF = 9The two-sided p-value of 0.21 indicates that there is essentially no evidence that the mean days of illness differs for animals fed no grain and animals fed 0.5kg of grain. This conclusion is only valid if the assumptions behind the t-test are approximately correct, however.
One assumption is that the distribution of days of illness in each group is fairly close to normal. Here are stemplots of the data by group (ignoring the ones in neither group):
MTB > stem 'ill'; SUBC> by 'test1'. Character Stem-and-Leaf Display Stem-and-leaf of ill test1 = 0 N = 8 Leaf Unit = 1.0 1 0 1 4 0 333 4 0 55 2 0 7 1 0 1 1 1 1 2 Stem-and-leaf of ill test1 = 1 N = 8 Leaf Unit = 1.0 1 0 1 (4) 0 2233 3 0 455These distributions look approximately normal, except for one one value of 12 days in the group fed no grain. Common sense tells us that we might expect to at least occasionally get animals that are sick for many days in any group, so we can expect that both distributions are at least somewhat skewed to the right. With only 8 cases in each group, this is cause for some concern that the p-value for the t-test might be somewhat inaccurate, but in my opinion this worry isn't serious enough to completely ignore the results, though it is a reason not to put a great deal of trust in them.
The other assumption behind the t-test is that the observations in each group are independent of other observations, both in the same group, and in the other group. We can't tell whether this is true from looking at the data. Common sense tells us that the animals might catch diseases from each other, and that they might all be affected in the same way by factors such as the weather, both of which could lead to a lack of independence. Perhaps the researchers conducting the experiment would be able to say how much of a problem this is likely to be in practice, given the circumstances of the experiment.
I did another t-test to address the question of whether feeding lots of grain (1.5kg or 2.0kg) causes illness, compared to feeding a smaller amount of grain (0.5kg or 1.0kg). For this test, I created a column called 'test2', set to '0' for the 16 animals fed 0.5kg or 1.0kg of grain, to '1' for the 16 animals fed 1.5kg or 2.0kg of grain, and to '*' for the animals fed no grain. I then compared the days of illness for the two groups as follows:
MTB > twot 'ill' 'test2' Two Sample T-Test and Confidence Interval Two sample T for ill test2 N Mean StDev SE Mean 0 16 3.94 2.02 0.50 1 16 5.12 4.66 1.2 95% CI for mu (0) - mu (1): ( -3.84, 1.5) T-Test mu (0) = mu (1) (vs not =): T = -0.94 P = 0.36 DF = 20The two-side p-value of 0.36 provides essentially no evidence against the null hypothesis that the mean days of illness is the same for the two groups. The assumption of independence behind this test could again be doubted, as explained above. The distributions for of the data in the two groups can again be viewed via stemplots:
MTB > stem 'ill'; SUBC> by 'test2'. Character Stem-and-Leaf Display Stem-and-leaf of ill test2 = 0 N = 16 Leaf Unit = 1.0 2 0 01 6 0 2233 (7) 0 4445555 3 0 677 Stem-and-leaf of ill test2 = 1 N = 16 Leaf Unit = 1.0 1 0 1 8 0 2223333 8 0 45555 3 0 6 2 0 2 1 2 1 2 1 2 1 67As with the first t-test, there are reasons (both common sense and from looking at the plots above) to belief that the distributions are somewhat skewed. With 16 animals per group, this is less of a worry than for the first test, but is still cause for some concern.
One animal fed 1.0kg of grain died, perhaps quite quickly, since it is recorded as having been ill for zero days. I retained this animal for the tests above. Zero days illness is less than any other animal, but some other animals were ill only one day, so it's not that extreme an observation.
Here is the regression of final weight ('ewt') on grain fed ('grain'):
The regression equation is ewt = 243 + 36.6 grain 38 cases used 2 cases contain missing values Predictor Coef StDev T P Constant 243.295 6.395 38.04 0.000 grain 36.600 5.176 7.07 0.000 S = 23.15 R-Sq = 58.1% R-Sq(adj) = 57.0% Analysis of Variance Source DF SS MS F P Regression 1 26791 26791 49.99 0.000 Residual Error 36 19292 536 Total 37 46084There were 38 cases, after one animal was ignored because of a data recording error, and one was ignored because it died. The confidence interval for the slope (the regression coefficient of 'grain') is therefore found using the t-distribution with 38-2=36 degrees of freedom. From Table D in the book, the critical value for a 95% confidence interval is about 2.03, interpolating between the values given for 30 df and 40 df. The confidence interval for the slope can therefore be computed as
(36.6 - 2.03*5.176, 36.6 + 2.03*5.176) = (26.1, 47.1)
Here is the regression of change in weight (called 'cwt', computed as 'ewt'-'swt') on grain fed:
The regression equation is cwt = 79.8 + 42.7 grain 37 cases used 3 cases contain missing values Predictor Coef StDev T P Constant 79.838 4.900 16.29 0.000 grain 42.658 3.914 10.90 0.000 S = 17.05 R-Sq = 77.2% R-Sq(adj) = 76.6% Analysis of Variance Source DF SS MS F P Regression 1 34525 34525 118.78 0.000 Residual Error 35 10173 291 Total 36 44698 Unusual Observations Obs grain cwt Fit StDev Fit Residual St Resid 22 1.00 157.00 122.50 2.80 34.50 2.05R R denotes an observation with a large standardized residualThere are only 37 cases here because of a data recording error for 'swt', which makes 'cwt' unknown for that animal. The t-distribution with 37-2=35 degrees of freedom should be used for the confidence interval, but the difference in the critical value from before is negligible. The confidence interval for the slope is found to be
(42.658 - 2.03*3.914, 42.658 + 2.03*3.914) = (34.7, 50.6)
The true values for the slopes for these two regressions should be the same. The first represents the effect of grain on the final weight, the second the effect of grain on the change in weight. These effects could be different only if the starting weight were different for animals fed different amounts of grain. But since the animals were assigned to groups fed different amounts of grain randomly, there should be no systematic tendency for animals fed different amounts of grain to have different starting weights. It is therefore not surprising that the two confidence intervals computed above overlap, since we expect that usually they will both include the same true value.
Although the estimated slopes in the two regressions are measures of the same thing (the effect of grain on weight), the second regression appears to be preferable, since it results in a smaller standard deviation for the residuals, hence a smaller standard error for the slope in the regression, and therefore a narrower confidence interval for this slope. This is due to the random variation due to different starting weights being eliminated when we look at the change in weight rather than the final weight. When looking at males and females separately, it therefore makes sense to look at the change in weight rather than the final weight. Here is the regression for males:
The regression equation is cwt-m = 80.2 + 31.6 grain-m 17 cases used 3 cases contain missing values Predictor Coef StDev T P Constant 80.200 4.707 17.04 0.000 grain-m 31.589 3.735 8.46 0.000 S = 10.53 R-Sq = 82.7% R-Sq(adj) = 81.5% Analysis of Variance Source DF SS MS F P Regression 1 7924.2 7924.2 71.53 0.000 Residual Error 15 1661.7 110.8 Total 16 9585.9And here is the regression for females:
The regression equation is cwt-f = 80.5 + 51.7 grain-f Predictor Coef StDev T P Constant 80.468 4.499 17.89 0.000 grain-f 51.682 3.614 14.30 0.000 S = 11.98 R-Sq = 91.9% R-Sq(adj) = 91.5% Analysis of Variance Source DF SS MS F P Regression 1 29381 29381 204.55 0.000 Residual Error 18 2585 144 Total 19 31967 Unusual Observations Obs grain-f cwt-f Fit StDev Fit Residual St Resid 12 1.00 157.00 132.15 2.68 24.85 2.13R R denotes an observation with a large standardized residualNote that the standard deviation of the residuals is smaller in both of these regressions than for the regression done on all animals. The difference in slopes (31.6 versus 51.7) seems large compared to the standard errors for these slopes (3.7 and 3.6). This leads me to suspect that the effect of grain is different for males and females, but to do a formal statistical test, we need to combine these two regressions into one.
To do this, I computed the 'fgrain' column as follows:
MTB > let 'fgrain'='sex'*'grain'I then did a regression of 'cwt' on 'sex', 'grain', and 'fgrain':
The regression equation is cwt = 80.2 + 0.27 sex + 31.6 grain + 20.1 fgrain 37 cases used 3 cases contain missing values Predictor Coef StDev T P Constant 80.200 5.073 15.81 0.000 sex 0.268 6.624 0.04 0.968 grain 31.589 4.026 7.85 0.000 fgrain 20.093 5.283 3.80 0.001 S = 11.34 R-Sq = 90.5% R-Sq(adj) = 89.6% Analysis of Variance Source DF SS MS F P Regression 3 40451 13484 104.77 0.000 Residual Error 33 4247 129 Total 36 44698 Source DF Seq SS sex 1 3146 grain 1 35443 fgrain 1 1862 Unusual Observations Obs sex cwt Fit StDev Fit Residual St Resid 22 1.00 157.00 132.15 2.54 24.85 2.25R 30 1.00 135.00 157.99 3.06 -22.99 -2.10R R denotes an observation with a large standardized residualThe constant terms and the coefficient for 'grain' match the values found in the regression for males only, which makes sense since for males the other two terms are zero (since both 'sex' and 'fgrain' are zero for males). The constant plus the coefficient for 'sex' matches the constant found in the regression for females only, and the sum of the coefficients for 'grain' and 'fgrain' matches the coefficient of 'grain' in the regression for females only. This multiple regression therefore effectively combines the two separate regressions.
We can now test the null hypothesis that the coefficient of 'fgrain' is zero (ie, that the effect of grain is the same for males and females). The (two-sided) p-value is 0.001, which constitutes quite strong evidence that this null hypothesis is false (ie, that the effect of grain is different for males and females).
This conclusion is valid only if the assumptions behind the computation of this p-value are correct. One assumption is that the residuals are independent. Since the animals were assigned to groups randomly, no dependencies were introduced from this assignment. It is possible that there are dependencies arising from common factors such as the weather. For example, it could be that grain causes a bigger increase in weight for males than for females in hot weather, but it causes a bigger increase in weight for females than for males in cold weather. If it happened to have been cold all summer, the larger effect of grain on females that we see in the data could appear to be statistically significant, even if the average effect of grain on weight - over many summers, some hot, some cold - is the same for males and females. The possibility of such dependencies is reason for caution, but we shouldn't let our ability to imagine such scenarios prevent us from ever coming to any conclusions.
The other assumption behind the computation of the p-value is that the residuals are approximately normally distributed. We can check this using a normal quantile plot. I asked Minitab to store the residuals from the above regression in a new column ('RESI1'), and then produced a normal quantile plot as follows:
MTB > nscores 'RESI1' c18 MTB > plot 'RESI1'*c18The resulting plot (Postscript, PDF) shows a good match to a normal distribution.
We can therefore be reasonably sure that there is a difference in the effect of grain on males and females - feeding an extra kilogram of grain per day increases weight gain over the summer in females by about 52 kilograms, but it increases weight gain in males by only about 32 kilograms. In contrast, the p-value for the test of the hypothesis that the coefficient of 'sex' is zero is 0.968. Hence we have no reason to believe that males and females gain different amounts of weight (on average) when they are fed no grain.
Finally, I did a regression of the final weight ('ewt') on 'sex', 'grain', 'fgrain', 'swt', and 'age'. Since the starting weight ('swt') is included as an explanatory variable, it is no longer necessary to look at the change in weight rather than the final weight - the effect of random variation in 'swt' will not show up in the residual when 'swt' is an explanatory variable. (The previous technique of looking at 'ewt'-'swt' rather than 'ewt' is in fact equivalent to doing a regression for 'ewt' with 'swt' included as an explanatory variable, but with its coefficient fixed at one.) By including 'age' as an explanatory variable, variation in this variable resulting from random assignment to groups will also be removed from the residual.
Here is the result:
ewt = 215 - 5.34 sex + 28.1 grain + 24.5 fgrain + 1.12 swt - 0.411 age 37 cases used 3 cases contain missing values Predictor Coef StDev T P Constant 215.10 65.85 3.27 0.003 sex -5.344 7.160 -0.75 0.461 grain 28.052 4.223 6.64 0.000 fgrain 24.543 5.489 4.47 0.000 swt 1.1249 0.1687 6.67 0.000 age -0.4110 0.2283 -1.80 0.082 S = 10.92 R-Sq = 92.0% R-Sq(adj) = 90.7% Analysis of Variance Source DF SS MS F P Regression 5 42373.4 8474.7 71.11 0.000 Residual Error 31 3694.6 119.2 Total 36 46068.0 Source DF Seq SS sex 1 7.0 grain 1 28016.7 fgrain 1 4810.3 swt 1 9153.3 age 1 386.1 Unusual Observations Obs sex ewt Fit StDev Fit Residual St Resid 8 0.00 203.00 207.34 7.63 -4.34 -0.56 X 22 1.00 310.00 286.91 2.80 23.09 2.19R 30 1.00 280.00 301.75 3.10 -21.75 -2.08R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence.Not much has changed from the previous regression. The standard deviation of the residuals is only slightly smaller, indicating that including 'age' and allowing 'swt' to have a coefficient other than one has not made the predictions much better. The (two-sided) p-value for testing the null hypothesis that the coefficient of 'age' is zero is 0.082, which is only very mild evidence against the null hypothesis. The animals were all about the same age, however, which makes it difficult to a good estimate of the effect of age, so this failure to clearly reject the null hypothesis would not be surprising even if age does have some effect.