代写UN3412 Introduction to Econometrics Problem Set 3 Spring 2019调试R语言程序

2024-07-26 代写UN3412 Introduction to Econometrics Problem Set 3 Spring 2019调试R语言程序

Department of Economics

UN3412

Spring 2019

Problem Set 3

Introduction to Econometrics

Points are out of 60 points

1.   [20 points] Use the data in hprice1.dta. to estimate the following model (description of the variables in the data set is listed below in Table 1 :

price = β0  + β1sqrft + β2bdrms + u

where price = the (selling) price of the house (in 1000 dollars), sqrft = size of house (square feet) and bdrms = number of bedrooms in the house.

(a) Write out the estimation result in equation form. [2 point]

(b) What is the estimated increase in price for a house with one more bedroom keeping square footage constant? [2 point]

(c) What is the estimated increase in price for a house with an additional 1400-square-foot bedroom added? Compare this to your answer in (b). [4 points]

(d) What percentage of the variation in price is explained by square footage and number of bedrooms? Compare your answer to the adjusted R2. Explain the difference. [4 points]

(e) Consider the first house in the sample. Report the square footage and number of

bedrooms for this house. Find the predicted selling price for this house from the OLS regression line. [4 points]

(f)  What is the actual selling price of the first house in the sample? Find the residual of this house. Does it suggest that the buyer underpaid or overpaid for the house? Explain. [4    points]

Table 1

DATA DESCRIPTION, FILE: hprice1.dta

Variable

Definition

price

House price, in $1000.

Assess

Assessed value in $1000.

bdrms

Average number bedrooms.

Lotsize

Size of lot in square feet.

Sqft

Size of house in square feet

colonial

= 1 if house is in Colonial style. = 0 otherwise.

Lprice

Log(price)

lassess

Log(assess)

llotsize

Log(lotsize)

lsqft

Log(sqft)

2.   [20 points] Allcott and Gentzkow (2017) conducted an online survey of US adults regarding fake news after the 2016 presidential election. In their survey, they showed survey

respondents news headlines about the 2016 election and asked about whether the news

headlines were true or false. Some of the news headlines were fake and others were true.

Their dependent variable Yi  takes value 1 if survey respondent i correctly identifies whether the headline is true or false, value 0.5 if respondent is “not sure”, and value 0 otherwise.

Suppose that one conducts a similar survey and obtains the following regression result:

Y(̂)i       = 0.65 + 0.012college + 0.015ln(Daily media time) + 0.003Age, R2   = 0.14,  n = 828,

     (0.02) (0.004)               (0.003)                                  (0.001)

where college is a binary indicator that equals 1 if a survey respondent is college graduate  and 0 otherwise, ln(Daily media time) is the logarithm of daily time consuming media, and Age is age in years.

(a)   Suppose that you would like to test that people with higher education have more

accurate beliefs about news at the 1% level. State your null hypothesis precisely and report your test result. [4 points]

(b)   The estimated coefficient for ln(Daily media time) is significantly positive. Interpret this result. Explain why this is plausible. [4 points]

(c)   Even if Age is omitted, there will be little concern about the omitted variable bias problem. Do you agree? Explain briefly. [6 points]

(d)   Suppose that you now conjecture that Republicans may have different beliefs about news than Democrats. Assume that there are three groups in the data: Democrats,

Republicans and Independents. How would you change the specification of the linear regression model by adding or subtracting regressors? Explain briefly. [6 points]

3.   [20 Points] Consider the following Population Linear Regression Function (PLRF):  yi  = β0  + β1x1i  + β2x2i  + β3x3i  + β4x4i  + β5x5i  + ui                                                (1)

where, yi  = average hourly earnings/wage in $, x1= years of education, x2  = years of potential experience, x3  = years with current employer (tenure), x4  = 1 if female, x5  = 1 if nonwhite, and ui = the usual error term of the model.

For this question, use the WAGE data set that you used in PS#2. Here is the description of the variables in the dataset for your consumption. We might be using this data set for the coming problem sets too.

Obs:   526

1. wage                     average hourly earnings

2. educ                     years of education

3. exper                    years potential experience

4. tenure                   years with current employer

5. nonwhite                 =1 if nonwhite

6. female                   =1 if female

7. married                  =1 if married

8. numdep                   number of dependents

9. smsa                     =1 if live in SMSA

10. northcen                 =1 if live in north central U.S

11. south                    =1 if live in southern region

12. west                     =1 if live in western region

13. construc                 =1 if work in construc. Indus.

14. ndurman                  =1 if in nondur. Manuf. Indus.

15. trcommpu                 =1 if in trans, commun, pub ut

16. trade                    =1 if in wholesale or retail

17. services                 =1 if in services indus.

18. profserv                 =1 if in prof. serv. Indus.

19. profocc                  =1 if in profess. Occupation

20. clerocc                  =1 if in clerical occupation

21. servocc                  =1 if in service occupation

22. lwage                    log(wage)

23. expersq                  exper^2

24. tenursq                  tenure^2

(a) Consider the following restricted version of equation (1) yi  = β0  + β1x1i  + β2x2i  + ui. Suppose that x2   is omitted from the model by the researcher. For x2  to cause omitted variable bias (OVB), what conditions should it satisfy?  Show mathematically that the OLS estimator β1  is biased if x2  is omitted from the model. [4 Points]

(b) Run a regression of yi   = β0  +  β4x4  + ui   and interpret the slope coefficient β4 . (Hint: x4 is a binary explanatory variable.) [2 Points]

(c) First generate a dummy variable Di  such that Di   = 1 if male and Di  = 0 if female. Then run a regression of   yi   = β0  + β1x1  + β4x4  + β6 Di  + ui.  What  do  you  notice  in  the result? Explain why? Show mathematically that if x4   and Di   are related, this result is inevitable. [6 Points]

(d) Run, first, a simple regression of  yi   = β0  + β1x1  + ui  then yi   = β0  + β1x1  + β2x2  + ui. Explain what happened to β1 (before and after) and why it happened. [2 Point]

(e) Now run the full model (1), using both homoscedastic-only and heteroskedasticity-robust standard errors, and interpret and compare the results of both regressions. Why do we care about heteroskedasticity problem that might exist in the data? [4 Points]

(f)  Based on the regression result of the later (i.e., heteroskedasticity-robust standard errors), conduct the following hypothesis testing:

i.          H0 :  βi   = 0  vs H1 : βi   ≠ 0 where i  = 1, 2, … , 5

ii.         H0 :  β1   = β2  = β3  = β4  = β5  = 0  vs H1 : At least one βi   ≠ 0    [2 Point]

Following questions will not be graded, they are for you to practice and will be discussed at the recitation:

1.   SW Exercise 7.1

 

2.   SW Exercise 7.4

(a) The F-statistic testing the coefficients on the regional regressors are zero is 6.10. The 1% critical value (from the F 3, O distribution) is 3.78. Because 6.10 > 3.78, the regional effects are significant

at the 1% level.

(bi) The expected difference between Juanita and Molly is (X6,Juanita     X6,Molly) . ®6 = ®6. Thus a 95% confidence interval is   0.27 ± 1.96 . 0.26.

(bii) The expected difference between Juanita and Jennifer is (X5,Juanita     X5,Jennifer) . ®5 + (X6,Juanita

X6,Jennifer) . ®6 =    ®5 + ®6. A 95% confidence interval could be constructed using the general methods discussed in Section 7.3. In this case, an easy way to do this is to omit Midwest from the regression

and replace it with X5 = West. In this new regression the coefficient on South measures the

difference in wages between the South and the Midwest, and a 95% confidence interval can be computed directly.

3.   SW Empirical Exercises 7.1

 

Regressor

Model

a

b

Age

0.60  (0.04)

0.59 (0.04)

Female

 

3.66 (0.21)

Bachelor

 

8.08 (0.21)

Intercept

1.08  (1.17)

–0.63 (1.08)

 

 

 

SER

9.99

9.07

R2

0.029

0.200

R2

0.029

0.199

(a) The estimated slope is 0.60. The estimated intercept is 1.08.

(b) The estimated marginal effect of Age on AHE is 0.59 dollars per year. The 95%

confidence interval is 0.59 ± 1.96 × 0.04 or 0.51 to 0.66.

(c) The results are quite similar. Evidently the regression in (a) does not suffer from

important omitted variable bias.

(d) Bob’s predicted average hourly earnings = (0.59 × 26) + (− 3.66 × 0)  +  (8.08 × 0)

− 0.63 = $14.17. Alexis’s predicted average hourly earnings = (0.59 × 30) + (− 3.66 × 1)

+  (8.08 × 1)  − 0.63  = $21.49.

(e) The regression in (b) fits the data much better. Gender and education are important

predictors of earnings. The R2 and R2   are similar because the sample size is large (n = 7711).

(f)  Gender and education are important. The F-statistic is 781,which is (much) larger than the 1% critical value of 4.61.

(g) The omitted variables must have non-zero coefficients and must correlated with the

included regressor. From (f) Female and Bachelor have non-zero coefficients; yet there  does not seem to be important omitted variable bias, suggesting that the correlation of    Age and Female and Age and Bachelor is small. (The sample correlations are  Cor (Age, Female) = −0.03 and  Cor (Age,Bachelor) = 0.00).

4.   How would you construct a confidence interval for a single coefficient in multiple regression? 

5.   Describe how to obtain a confidence set for two parameters in the multiple regression model.

6.   What is a control variable in multiple regression? Give an example and explain why it can be useful in practice.