Assignment Three
Due in class (at the beginning of class) on Wednesday, February 17.
Problems one and two can be written on paper and turned in on paper.
Problems three through five must be turned in in two forms. First, I
want a pdf document (no word .doc files!) with discussion, conclusions,
and any graphs you are told to (or want to) include. It is okay to
include snippets of R output, but you must interpret the output
and explicitly state any conclusions (I know how to both produce the
correct R output, and how to interpret it. Convincing me
that you've learned to produce the correct output is part of what you
should be doing, but you can't leave the interpretation up to me!).
E-mailing me the .pdf document is fine. You should also send me the
R code that you used to produce your output. Be sure to clearly
indicate (with comments, etc.) which input produces which outputs used
and discussed in your .pdf write-up (marking number 3, 4, 5 is necesary
but not sufficient). Also, any analysis is sure to include several
false starts or incorrect things, which you later fixed up. Be sure to
clean those out of the file you turn in to me.
Here is an example solution to the first part of problem 5 to see
what you should be turing in.
- Exercise 5.1 from the book.
- Use the above to prove that
Σi=0n (yi-
y)2 =
Σi=0n (yi -
ŷi)2 +
Σi=0n
(ŷi - y
)2, that is, the total sum of squares is the sum of the
residual sum of squares and the regression sum of squares.
- Use the dataset prestige.data. Consider the
women variable, which measures the percentage of women in each
occupation. Use a hypothesis test and confidence interval to
determine if a lower percentage of women are in the professional
category (type = "prof") than in the white collar
category (type = "wc").
- Analyze the data in sahlins.data
by regressing Acres/Gardener on Consumers/Gardener. In a society
characterized by "primitive communism," the social product
of the village would be redistributed according to need, while each
household would work in proportion to its capacity, implying a
regression slope of 0. In contrast, in a society in which
redistribution is purely through the market, each household should
have to work in proportion to its own consumption needs, suggesting a
positive regression slope and an intercept of 0. Interpret the
results of the regression in light of these observations. Examine
and interpret the slope, intercept, standard error of the regression,
and r2. Do the results change if the fourth
household is deleted (using data.frame[-4,] is probably the
most straightforward way to omit the fourth observation for now --
we'll see a better way of doing this later)? Plot the regression
lines calculated with and without the fourth household on a
scatterplot of the data. Does either regression do a good job of
summarizing the relationship between Acres/Gardner and
Consumers/Gardener?
- The file anscombe.data
presents data on the U.S. states and Washington D.C. for state
per-capita public school expenditures (in dollars), per-capita annual
income (dollars), the proportion of residents under the age of 18
(per 1000), and the proportion of the population residing in urban
areas (per 1000). Regress school expenditures (y) on each of
income, proportion under 18, and proportion urban. Plot the
least-squares line for each regression on a scatterplot of the data.
Does the line adequately capture the relationship between the
variables shown in the plot? In each case, examine and interpret the
slope, intercept, regression standard error, and
r2.