Problem Set #2
Causal Inference (PSC 8121), Spring 2025
Due: 3/20
We're going to analyze the replication data for Miguel et al.'s article, "Economic Shocks and Civil Conflict: An Instrumental Variables Approach." The data file is called mss_repdata.dta. It contains a variety of variables, including rainfall, growth, conflict, and democracy data, for 41 African countries from 1981 to 1999. Please answer the following questions. Include a copy of any commands you use. Feel free to describe your results in text form.
1. Replicate Model 5 from Table 4, just to make sure everything is set up right.
2. Repeat this model type, but replace rainfall growth with the difference from the mean for that country. Report and interpret the diagnostic tests that you regard as important for analyzing the strength of the design. What are the strengths and weaknesses of this test relative to Model 5?
3. Choose another alteration to the IV approach in Model 5 that you think is equally sensible ex ante. What are the results from the second stage and the diagnostic tests? Should this affect how we evaluate the paper?
4. Now adjust Model 5 by only including current economic growth as an endogenous regressor (i.e., leave out lagged growth). Use the same two instruments. Also, remove the clustered standard errors (this is needed to get the overid tests working). What are the second-stage results and what does this tell you? Report the results of the overidentification test and interpret this substantively.
5. Scholars have suggested that democratization is also driven by economic crises and thus will be more likely in periods of rainfall-driven contraction. Test this by predicting democratization (in a sample of autocracies) using appropriate variations of Models 5 and 6 from Table 4. Should you use ivreg2 or ivprobit? What do the results tell you? Is this as strong a design as that predicting civil conflict? Why or why not?
Data Notes: any_prio is a dummy for civil conflict. gdp_g and gdp_g_l are economic growth and lagged growth, respectively. GPCP_g and GPCP_g_l are rainfall growth and lagged growth, respectively. GPCP_df_mean and GPCP_df_mean_l are the rainfall differences from the mean in the country. dem is a dummy variable for polity being greater than or equal to 6 (-10 to 10 scale).
The following commands will be helpful for Stata:
global x_fl = "y_0 polity2l ethfrac relfrac Oil lpopl1 lmtnest"
• This is the global for the controls
global x_year = "Iccyear*"
• This is the global for the country year-trends (Iccode* adds the country fixed effects)
Models 5 and 6 of Table 4 are the following:
• ivreg2 any_prio $x_fl $x_year (gdp_g gdp_g_l = GPCP_g GPCP_g_l), robust cluster(ccode) first
• ivreg2 any_prio Iccode* $x_year (gdp_g gdp_g_l = GPCP_g GPCP_g_l), robust cluster(ccode) first