Solution to STA 250 Assignment #1 (Fall 1999), First Data Set

Here is a model solution to Assignment #1 (Postscript, PDF), for one particular set of data (and here is another). I have written the report to follow more-or-less the steps I went through in analysing the data, so that you can see what I did, though this wouldn't always be the best way to write a report like this. The links below get you to the data, Minitab worksheets, and Minitab plots. The worksheets and plots can be accessed in either Postscript or PDF formats. The Postscript is best viewed on CQUEST by choosing "0.707" from the "1.000" menu.

Initial examination of the data

I read the original data into Minitab, and then gave the columns appropriate names, producing a Minitab worksheet with the original data (Postscript, PDF).

I then looked at stemplots for the 'age', 'swt', 'ill', and 'ewt' variables. I found that case 7 has suspicious values for both 'age' (38 days) and 'swt' (19kg). These values are not possible given the way the experiment was intended to be carried out, since the animals were all supposed to be approximately one year old. They must be data recording errors. I therefore deleted these values, replacing them by "*", so that Minitab will ignore this case whenever the values of these variables would be needed.

One animal died during the experiment, and its final weight ('ewt') was therefore recorded as zero. With only one death, it seems best to ignore this animal, treating its death as an exceptional occurrance, and look only at the relationships among the live animals. I therefore replaced 'ewt' with "*" for this animal as well.

I then made a scatterplot of 'ewt' vs. 'grain' (Postscript, PDF), since the relationship between these variables is what we are primarily interested in. One outlier was evident in this plot, for which 'ewt' is about 150. This point is due to case 23, for which both 'ewt' and 'swt' are exactly 160kg. It seems unlikely that this animal actually gained no weight at all (it was ill only 4 days, so illness wouldn't explain this). Perhaps 'ewt' was accidently set to be the same as 'swt'. I replaced 'ewt' for this animal by "*" so that it wouldn't be included in later analyses involving 'ewt'.

Regression of final weight on amount of grain fed

After deleting outliers as described above, I did a regression of 'ewt' on 'grain', and looked at a scatterplot of these variables with the regression line shown (Postscript, PDF). A clear positive relationship was seen. Some apparent deviations from a perfect linear relationship can be seen in the scatterplot, but these may be just due to chance. The regression equation found was

ewt = 243 + 36.6 grain
The standard deviation of the residuals was 23.2.

Regression of change in weight on amount of grain fed

Some of the residual variation in the regression of 'ewt' on 'grain' may be due to variation in starting weight. In order to eliminate this, I computed a variable 'cwt' as the difference of 'ewt' and 'swt', so that 'cwt' is the change in the animal's weight over the course of the experiment. Looking at the effect of 'grain' on 'cwt' should have the same significance to the farmer as looking at 'ewt'. The farmer is interested in increasing the final weight of the animals. Since the starting weight is beyond the farmer's control (or at least it isn't affected by how much grain they are fed later), increasing the final weight is the same as increasing the change in weight.

The regression of 'cwt' on 'grain' gave the regression equation

cwt = 80 + 42.7 grain
The standard deviation of the residuals for this regression was 17.1. Since this is less than the standard deviation of the residuals for 'ewt', looking at 'cwt' seems to be a good idea. The lower amount of random variation should make the results more reliable.

I looked at a scatterplot of 'cwt' vs. 'grain' with the regression line plotted (Postscript, PDF). I also looked at a plot of the residuals vs. 'grain' (Postscript, PDF). From these plots, the standard deviation of the residuals seems to be greater for large values of 'grain' than for small values, though it's hard to be sure that this isn't just chance.

I also looked at a plot of the residuals of the regression for 'cwt' vs. 'sex' (Postscript, PDF). The mean of the residuals seems to be greater for females (sex=1) than for males (sex=0). This suggests that sex is a lurking variable whose effect should be examined.

The effect of sex on how weight is influenced by grain

To see how the sex of the animals affects the relationship between the amount of grain fed and the change in weight, I made a scatterplot of 'cwt' vs. 'grain' in which different symbols were used for the two possible values of 'sex' (Postscript, PDF). This plot shows that when no grain is fed, the change in weight is about the same for males and females. Feeding grain seems to increase the weight gain for both males and females, but the effect seems to be greater for females than for males. This explains why the standard deviation of the residuals in the regression of 'cwt' on 'grain' is bigger for large amounts of grain, as previously noticed, since it is for large amounts of grain that the differences between males and females will be greatest.

Because the relationship seems to be different for males and females, I produced new columns of data that separated the males from the females (using the unstack command). The 'grain-f' and 'cwt-f' columns contain the amounts of grain fed to females and the corresponding changes in weight. The 'grain-m' and 'cwt-m' columns contain the data for males.

I then did separate regressions for the males and the females, and produced scatterplots with the regression line shown for each group (Males: Postscript, PDF, Females: Postscript, PDF). Here are the regression equations and the residual standard deviations for the two groups:

Males:
cwt = 80 + 31.6 grain, s=10.5
Females:
cwt = 80 + 51.7 grain, s=12.0

Other possible relationships

The effect of feeding grain on the health of the animals is also of interest. Since there was only one death (in the group fed 1.0kg of grain per day), this will have to be investigated by looking at how many days animals in each group were ill. From looking at side-by-side boxplots of 'ill' vs. 'group' (Postscript, PDF), I concluded that there was very little relationship between the two.

I also looked at a scatterplot of 'cwt' vs. 'ill', which shows a slight negative relationship, as one might expect. No animals were ill for more than 17 days (out of 100 days for the experiment), so the lack of a strong relationship may be due to the range of days of illness being small.

I looked at the relationship between 'age' and 'swt', and found a slight positive correlation, as would be expected, since older animals ought to be bigger. I found a slight negative correlation between 'age' and 'cwt'. This also makes sense, since older animals are closer to being full-grown, and hence will not gain as much weight. Both of these relationships were quite weak, however, and might have been due to chance. The range of 'age' is not very great, since all animals were approximately one year old, so strong correlations would not be expected.

Modified worksheet

For reference, here is the worksheet with outliers deleted, with the 'cwt' column set to 'ewt'-'swt', and with the separate columns for males and females: Postscript, PDF.

Conclusions

The results of the experiment indicate that feeding grain to the animals does increase their final weight. The magnitude of this effect is different for males and for females. For the male animals, every extra kilogram per day of grain increases the final weight (at the end of the 100 day experiment) by about 30 kilograms. For the female animals, every extra kilogram of grain per day increases the final weight by about 50 kilograms. Depending on the cost of grain, it might therefore make sense to feed grain to the females but not to the males.

These conclusions apply only to animals of the sort used in this experiment. In particular, they apply only to animals that are approximately one year old at the start. Also, the results may not hold when the amount of grain fed to the animals is more than the maximum in this experiment (2 kilograms per day).

It appears that feeding the animals grain does not have any substantial affect their health, either way.