LI Econometrics (08 34484)
Stata Assignment
In this assignment, you will explore correlates of earnings. You will use data on real individuals collected by the Office for National Statistics and published in the Quarterly Labour Force Survey.
In the course of working on this assignment, you will:
• be introduced to an important source of data for research on the UK (the UK Data Service);
• become familiar with processing and analysing data using Stata;
• interpret regression output;
• develop critical thinking about economic phenomena and econometric analysis of them.
Practical details
Word limit: no more than 1500 words. Reasonable use of tables and figures does not count toward the word limit.
Submission file: one file (in .pdf or .doc format).
Sections: Please label the sections and sub-sections clearly.
Figures and tables: should be numbered and titled appropriately. Tables should be formatted and presented as in standard economics journals (copy-pasted output from Stata is not acceptable).
Appendix: an appendix should be included at the end of the assignment, containing a copy of the Stata code used to obtain the results. This should be the exact code used to go from loading the dataset to generating the results presented (no more, no less). For convenience, you can simply copy and paste the .do file.
Preliminaries
You will have familiarised yourself with how to use the UK Data Service in the support session in week 6. Following the same steps, access and download “Quarterly Labour Force Survey, April-June, 2018 (SN 8381)".
0. Loading data and defining the sample
a) Load the main dataset (lfsp_aj18_eul.dta) in Stata.
b) We will focus on individuals reporting positive gross weekly earnings and not currently working towards a qualification. To keep only these observations, do:
keep if GRSSWK > 0 & QULNOW == 2
c) Check: the resulting dataset should contain 9141 observations. If this is not the case, something has gone wrong somewhere.
Section A
1. Education and earnings
a) Plot the distribution of weekly earnings (GRSSWK) in a histogram and provide a brief comment.
b) Let us think about respondents' education and in particular the correlation between the attainment of different qualifications and earnings. What does economic theory say about the role of education in determining labour outcomes?
c) Using the variable that details the highest qualification that each respondent has achieved (HIQUL15D), you will create qualification dummy variables. Firstly, tabulate HIQUL15D to see what categories exist. Using this information, remove anyone that either did not answer or did not know their highest qualification, i.e. keep only those people that have a qualification of some sort or have no qualifications at all. This should leave you with 9025 observations.
d) Next, construct three new variables, (i) the logarithm of gross weekly earnings, i.e. log(GRSSWK); (ii) the square of the respondents’ age, using the AGE variable; and (iii) a dummy variable that takes the value 1 if someone has a degree or equivalent and 0 otherwise. Then, estimate the regression:
Report your findings in a table. Discuss your findings and report on whether having a degree significantly affects respondents' earnings. Are these results what you expected?
e) Now construct dummy variables for the remaining HIQUL15D categories, i.e. a dummy variable for higher education, a dummy variable for A-levels etc, giving the variables appropriate names. Then estimate the regression
Report your findings in an additional column in the same table. Discuss how the different qualifications influence earnings and whether these results are what you expected.
f) Next, test whether each successive qualification has the same impact on earnings, using the test command in Stata. For this you need to do four separate tests to see whether β1 = β2 and then whether β2 = β3 and so on. Report and discuss these F-test results. For the test of β1 = β2, show how the F statistic was calculated. To do this you will need to estimate the restricted model to retrieve the restricted residual sum of squares and you can use the display command to do the F statistic calculation.
2. Men and women
a) Estimate equation (2) separately for men and women and report your findings in additional columns of the table you created above. To do this you will need to make use of the variable called SEX.
b) What do you conclude about the impact of qualifications on the earnings of men and women? Interpret your findings using economic theory and intuition.
Section B
3. Other factors
a) Pick an additional dimension on which information is available in the dataset (consult the user manual and codebook (lfs_user_guide_vol3_variabledetails2018.pdf) for details) and construct variable(s) which allow you to explore the relationship between this dimension and earnings. Without running additional regressions (yet), why does economic theory tell you to expect this dimension to matter for earnings? Can we expect this dimension to interact with the “qualification effect" or “gender effect" we explored already?
b) Estimate additional regressions to test whether your theoretical predictions hold in the data. Present these in a second table and discuss.
4. Taking a step back
a) Suppose that individuals do not report their weekly earnings perfectly, such that the variable log(GRSSWK) suffers from classical measurement error. Describe what is meant by classical measurement error in this context, and discuss the implications, if any, for the various results you have presented.
b) In this assignment, you have used labour force data from 2018. What differences, if any, would you expect to see across your various findings if you were to re-do the assignment using data from 1958? (Note: you are asked only to speculate on any differences; you are not expected to use any additional data.)