ECON 513
Spring 2025
Problem Set 3
Due 3/6/2025 Thu 11:59pm via Brightspace
1. Consider a linear model:
yi = x'iβ + εi
, i = 1, · · · , n
where xi ∈ R
k and E(xiεi) = 0. Suppose we have an valid instrumental variable zi ∈ Rl. Consider the 2SLS estimator βˆ
2SLS = (X0 Z(Z
0 Z)
−1Z
0 X)
−1X0 Z(Z
0 Z)
−1Z
0 y. Assume all regularity conditions hold.
(a) Prove that if l = k, i.e., the model is just-identified, the 2SLS estimator simplifies to the IV estimator βˆ
IV = (Z' X)
−1Z'y.
(b) Prove the consistency of 2SLS.
(c) Prove the asymptotic normality of 2SLS.
2. Consider a linear regression model with classical measurement error. We want to estimate the linear model:
yi = x*'iβ + εi with E(εix*i) = 0
but we do not observe x*i
, but only a noisy but unbiased measure of it, xi = x*i + vi where E(vi) = 0, vi ⊥ x*i
, vi ⊥ εi
, and E(x*i x*'i
) is of full rank. Hence, we can at best estimate a linear model
yi = x'iβ + ui (1)
(a) Find the probability limit of the least squares estimator for β in (1), as a function of E(x*i x*'i
), E(vivi'), β.
(b) Suppose that xi and x*i
are both scalar random variables. How does the probability limit in (a) compare to β?
(c) Suppose we additionally observe another measure of x
∗
i
, wi = x
∗
i + ηi where E(ηi) = 0, ηi ⊥ x*i
, ηi ⊥ εi
, and ηi ⊥ vi
. Can we use wi as a valid instrumental variable for xi
in (1)? Explain.
3. Consider the dataset Card1995. We will follow Card (1995) to see the relationship between education attainment and wage. Some of the variables of interest:
(a) We will first clean the data set. Generate the following variables
Figure 1: Variable Description
• exp = age76 - ed76 - 6
• exp2 = exp2/ 100
and drop observations with missing wage (lwage76).
(b) Regress lwage76 on ed76, exp, exp2, black, reg76r. What’s the interpretation of the coefficient on ed76? What is the potential endogeneity concern here?
(c) Consider the same wage equation as in (a). Now run a 2SLS using nearc4a and nearc4b as an instrumental variable for ed76. (Hint: help ivregress) What story makes college proximity a valid IV for education?
(d) Let’s see if nearc4a and nearc4b satisfy conditions for a valid IV. Run the first stage regression (i.e., regress ed76 on nearc4a, nearc4b along with other variables.) What’s the F-statistics of testing if nearc4a and nearc4b are both statistically significant in this first stage regression? Do you think these IVs are strong?
(Hint: help test)
(e) Now let’s see if the IVs are exogenous. Do an overidentification test. Do you think the IVs are exogenous?
(Hint: Run estat overid right after your 2SLS regression.)
(f) Let’s do the Durbin-Wu-Hausman test to detect endogeneity. We’ll do this step-by-step.
i. Rerun the regression in (a) and store the estimates. For example, you can use the command estimates store ls to store the estimation results under the name ls.
ii. Rerun the 2SLS in (b) and store the estimates under the name tsls.
iii. Conduct the Hausman test using the command hausman tsls ls. What do you find? Discuss.