Mathematics for Social Science 2024-25
Instructions
· You must include a cover sheet at the start of the assessment. Do not write your name or student number anywhere on the assessment.
· All questions are compulsory, so skipping a question will mean you will receive a point of zero for that question.
· The paper is marked out of a total of 100% percentage points. This exam is worth 85% of your final grade.
· Submit your exam on LEARN by noon (12:00) on December 4th 2024. Exams returned from 12:00 and 1 second onwards will have points deducted for late submission. Regular assessment penalties will apply for late submissions: 5 percentage points per calendar day, up to a maximum of 35 percentage points for 7 calendar days, and zero points will be given for an assessment handed in more than 7 days late.
· When in doubt, check your lecture slides and notes. The assessment requires only syntax and skills covered in class and in your class related material (e.g. statistics exercises, syntax files, lecture slides).
|
1. Principle Component Analysis
[30%]
Please run a Principle Component Analysis with the following variables from the 2015 British Election Survey (SPSS dataset “BES_2015_spssdata.sav”). You need to download this specific dataset from LEARN, and not use any other BES dataset you may have used in the course.
The module of questions you are looking at are all about asking respondents whether they think more or less public money should be spent on different things. Answers are on a 5-point Likert scale and range from much more than now (coded as 1) to much less than now (coded as 5). For more information see the data codebook uploaded on LEARN. Responses coded as “Don’t know”, and “Not stated” have been set as missing data, and are not taken into account in any PCA.
Use SPSS for this exercise. Carry out a Principle Component Analysis (PCA) of these variables using Varimax Rotation. Please explain the PCA results using up to 3 of the results outputs/tables produced by SPSS. Explore whether these survey questions seem suitable to be analysed using PCA and explain what diagnostic statistics you use to determine this showing that you understand how these diagnostic tests work. Please provide an interpretation of the Rotated results and what concepts the factors are capturing, and how components should be interpreted. Your total answer for the above should be up to 600 words long.
pubex_a
|
Thinking about public expenditure on HEALTH, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_b
|
Thinking about public expenditure on EDUCATION, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_c
|
Thinking about public expenditure on UNEMPLOYMENT BENEFITS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_d
|
Thinking about public expenditure on DEFENCE, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_e
|
Thinking about public expenditure on OLD-AGE PENSIONS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_f
|
Thinking about public expenditure on BUSINESS AND INDUSTRY, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_g
|
Thinking about public expenditure on POLICE AND LAW ENFORCEMENT, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
pubex_h
|
Thinking about public expenditure on WELFARE BENEFITS, should there be much more than now, somewhat more than now, the same as now, somewhat less than now, or much less than now?
|
2. Binary Logistic Regression
[40%]
2a. Please observe the output below, from a logit model which has been run on WDI data, and fill in the missing components (denoted by a space like this: “_____”) of the interpretations in the sentences below.
For every 1 percentage point increase in adult literacy within the population, the odds of a country having a functioning democracy (with voting and party alternation) rise by ____ to 1.
For every 10 percentage point increase in adult literacy within the population, the odds of a country having a functioning democracy (with voting and party alternation) rise by ______% (fractional odds).
2b. For this question you will need to use Stata and the Stata version of the 2017 British Election Survey, “BES_2017_data.dta”. Please make sure you download and use the correct dataset, and not other versions of the BES survey. (Max 400 words for section 2 – be concise!).
You want to investigate what the relationship is between people’s age, the political party they voted for in the 2017 election, and their self-reported level of trust in politicians. You will need the following variables to complete this task: y10_banded; trustpol; votefor.
Firstly, run some bivariate analysis and summarise succinctly what the relationship is between age and trust in politicians.
2c. Now you will run two logit models, one after the other. In the first model predicting trust in politicians, your only independent will be respondents’ age. In the second model you will add a second independent variable: votefor – i.e. the party that participants voted for in the most recent election. For the second logit model, for the “votefor” variable, please make response category 4 of this variable the reference category. For both models, for the age variable, make the youngest age band the reference category. The models should be run so Odds Ratios (not logits) are being reported. Please paste the output (and syntax) for both models below (using a screen grab/snapshot, and not copy pasting text, so that the results can be easily read).
2d. Discuss results of second model overall in simple language using odds ratios in some way in your interpretation.
2e. Look at how the Odds Ratio for the age 85+ category change from model 1 to model 2. Why do you think this may be happening, and what does this mean substantively?
2f. Explain, using your understanding of the Integration of the Normal Curve, what you make of the Odds Ratios for those who voted Labour, given the p-value for this specific result.
3. Interaction Effects
[20%]
This graph has been extracted from a research study using the Millennium Cohort Study following a nationally representative cohort of children born at the turn of the century in the UK, and is exploring interaction effects between different characteristics of the mothers who responded in the survey, and looks at different predicted probabilities of a cohort member (child) having experienced smacking at age seven for different groups of mothers. These results are from a model (not shown) with a sample size of N=7745 which is controlling for other socio-economic characteristics of the mother. The two variables used to test for interactions in the graph below are ethnicity and child abuse at age three.
· Ethnicity is a binary variable with either White or Other Ethnic Group.
· The child abuse variable is categorical and has three categories which refers to children who had experienced child abuse at age three: with (non) referring to no abuse, (moderate) referring to experiencing moderate abuse and (high) referring to experiencing high levels of abuse.
3a. Please interpret the results you see in simple language, using interpretive skills you learnt during the course.
3b. What do you think is a likely explanation for the fact that the size of the confidence interval for the other ethnic group and high child abuse exposure is so large, compared to the confidence interval size of the white and non child abuse exposure category?
4. Probability, Polls and Confidence Intervals
[10%]
The https://whatukthinks.org/eu/ website aggregates polling data on a number of polls that are run on UK populations.
On the 17-18th July 2020, YouGov ran an opinion poll asking respondents “The EU’s Erasmus programme helps Britons to get work experience and study in another European country, and Europeans to get work experience and study in Britain. Do you think the UK should stay a member or stop being a member of Erasmus at the end of 2020?”.
The characteristics of this poll were: Sample size = 1658 ; Percentage who answered they want to stay a member of Erasmus = 80% (if you exclude those who responded “don’t know”, and 24% of the sample responded this way).
Using the above information, and using the appropriate formula the way it should be used (and not the way polling companies often use it), calculate and answer the following question (showing your workings):
- Q1. Provide the 95% confidence interval for the estimate of the proportion responding that they wanted to stay a member of Erasmus.
- Q2. How many standard errors plus/minus the mean of 80% would you need to be 100% certain of a result falling within a given confidence interval?