ECON 3720: Introduction to Econometrics
Problem Set 04
Fall Semester 2024
Due: October 25th 2024
Submit the problem set to your TA’s mailbox in the Monroe Hall basement if you are writing your solutions on paper. If you want to use electronic ink (via a tablet) to write your solutions, upload a PDF of your solutions to Canvas. Please submit the problem set no later than on October 25th 2024 at 5 PM for in-person submission and 11:59 PM for online submission. Failure to do so will result in a a grade of 0. Remember to show all of your work. Good luck!
(1) In our discussion of multivariate regression with two regressors (k = 2), we saw that βb1, the estimated slope coefficient on the first regressor, Xi,1, can be recovered by a two step procedure:
A. First calculate the residuals from a regression of Xi,1 on Xi,2.
B. Regress Yi on the residuals from the previous step.
That is, by partialling out the variation in Xi,1 explained by Xi,2, the remaining variation in Xi,1 can be used to derive a closed form. expression for βb1. Here, we will see that there is also a population version of the notion of “partialling out”. Suppose we have a linear model:
Y = β0 + β1X1 + β2X2 + ε. (1)
Suppose that the following hold:
(I) E[ε] = 0;
(II) Cov (X1, ε) = 0;
(III) Cov (X2, ε) = 0.
Consider a linear model for X1 on X2 where we define
α2 = Var [X2]/Cov (X2, X1), (2)
α0 = E [X1] − α2E [X2] , (3)
U = X1 − α0 − α2X2. (4)
The corresponding linear model equation for X1 explained in terms of X2 in its usual form. is
X1 = α0 + α2X2 + U. (5)
(a) What are E[U] and Cov (X2, U)?
[Hint: You may find equations (3) and (4) useful for E[U] and equations (2) and (4) useful for Cov (X2, U).]
(b) In light of your answer to part (a), what is Cov (X1, U) in terms of α0, α2, Cov (X2, U) and Var[U]? Do not include any quantities that should equal zero in your answer.
[Hint: You may find equation (5) useful for this.]
(c) Given equation (4), is it true that Cov(U, ε) = 0? Show your work.
[Hint: You may find properties (II) and (III) above useful in your derivations.]
(d) Consider the quantity
γ1 = Var[U]/Cov(U, Y ).
Notice that γ1 is the slope parameter in the population line of best fit between our original outcome Y and the error U from a (population) regression of X1 on X2. Is it true that γ1 = β1 where β1 is the slope on X1 in the original model (1)?
[Hint: all of your answers to parts (a), (b) and (c) may be useful in this derivation.]
(2) The following model is a simplified version of the multiple regression model used by Biddle and Hamermesh (1990) to study the tradeoff between time spent sleeping and working and to look at other factors affecting sleep:
sleep = β0 + β1totwrk + β2educ + β3age + u,
where sleep and totwrk (total work) are measured in minutes per week and educ and age are measured in years.
(a) If adults trade off sleep for work, what is the sign of β1?
(b) What signs do you think β2 and β3 will have?
(c) Using the SLEEP75 data set, the estimated equation is
sleep = 3638 [.25 − 0.148 × totwrk + 11.13 × educ + 2.20 × age,
n = 706, R2 = 0.113.
If someone works five more hours per week, by how many minutes is sleep predicted to fall? Is this a large tradeoff?
(d) Discuss the sign and magnitude of the estimated coefficient on educ.
(e) Would you say totwrk, educ and age explain much of the variation in sleep? What other factors might the time spent sleeping? Are these likely to be correlated with totwrk?
[Note: This is Wooldridge Chapter 3 Problem 3.]
(3) Use the data in hprice1.dta to estimate the model
price = β0 + β1sqrft + β2bdrms + u,
where price is the house price measured in thousands of dollars.
(a) Write out the results in equation form.
(b) What is the estimated increase in price for a house with one more bedroom, holding square footage constant?
(c) What is the estimated increase in price for a house with an additional bedroom that is 140 square feet in size? Compare this to your answer in part (ii).
(d) What percentage of the variation in price is explained by square footage and number of bedrooms?
(e) The first house in the sample has sqrft = 2, 438 and bdrms = 4. Find the predicted selling price for this house from the OLS regression line.
(f) The actual selling price of the first house in the sample was $ 300,000 (so price = 300). Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for the house?
[Note: This is Wooldridge Chapter 3 Problem C3.]
(4) The Stata dataset vote1.dta contains information on election outcomes and campaign ex-penditures for 173 races for the House of Representatives in 1988. For this question make sure to attach all Stata graphs with your problem set.
(a) Report the sample correlations between voteA, expendA, expendB, and partystrA.
(b) Create a chart that displays three scatter plots, each with voteA on the vertical axis, and expendA, expendB, and partystrA on the horizontal axis.
(c) Compute the simple regression of voteA on expendA. Report the intercept, slope, and R2
. Create a scatter plot showing the raw data and the regression line.
(d) Compute the multivariate regression of voteA on expendA, expendB, and partystrA.
(e) Explain intuitively why your estimate of the effect of expendA on voteA differs between parts (c) and (d). Could you have predicted this difference based on the correlations you computed in part (a)? Explain.