Homework Four

  1. Consider the data set davis.data which, among other things gives the reported weight (repwt) and actual weight (weight) of some experimental subjects (reported weight is what they said they weighed). The idea here is that we would like to just ask people "How much do you weigh?" instead of actually having to weigh them. Thus in the model weighti = β0 + β1repwti + εi we would hope that the slope β1 would be equal to 1.

    Of course, we don't expect the data to give a slope exactly equal to one, but we've mentioned the idea of an "eyeball confidence interval" -- take the slope and add and subtract twice the standard error, 2 being a stand-in for 1.96, and the whole thing is a t and not normal anyway, but the idea is that this should be "close enough for government work". You are free to use that kind of confidence interval in this problem. We'd like 1 to be in this confidence interval. If you want to be more pedantically correct, you can use the command confint (assuming you did something like my.mod = lm(...) you then enter confint(my.mod)).

    1. Fit the model with weight as the y-variable and repwt as the x-variable.
    2. Compute a 95% confidence interval for the repwt slope. Is 1 in the confidence interval?
    3. Now plot the data. Notice anything odd? Which observation number is that? Look through the data by hand (with your eyeballs?) if you like. Assuming you typed plot(repwt,weight) you can type identify(repwt,weight) and click the point. Click some more points if you like. You'll have to right-click the plot and choose Quit to get the R prompt back. Turn in your plot and tell me what looks funny.
    4. Re-compute the slope and intercept without the offending point (lm(...,subset=-5) will fit the model omitting observation 5, you can probably adapt this to your situation). Then re-compute the confidence interval. What happens?
  2. The dataset prostate.data comes from a study of 97 men with prostate cancer who were due to receive a radical prostatectomy. Fit a model with lpsa as the response and lcavol as the predictor Record the residual standard error and r2. Now add lweight, svi, lpph, age, lcp, pgg45 and gleason to the model one at a time. For each model record the residual standard error and the r2.
  3. Using the model with lpsa as the response (i.e. y) and the other variables as predictors, compute 90 and 95 percent confidence intervals for the slope corresponding to age. Using just these intervals, what can be said about the p-value for age in the regression summary? (You'll need to use the confint command to do these confidence intervals.)