STATS 513
Winter, 2024
Exam #2 (Practice Exam for W25)
In this exam, we use parts of a dataset to study the new FAIR policies/renewals per 100 housing units collected from different regions. The variables are:
• fair: new FAIR policies/renewals per 100 housing units
• race: racial composition in percentage of minority
• fire: fires per 100 housing units
• theft: thefts per 1000 population
• age: percentage of housing units with age 25 years or older
• income: median family income in thousands of dollars
• region: different regions under study
1. A Q-Q plot (normal quantile plot) of ‘fire’ is shown in Figure 1. Please describe the distribution of the variable ’fire’ based on the plot: is it close to normal, long-tailed, short-tailed, skewed to the right, skewed to the left?
2. suppose we like to use the Jackknife residuals to identify which data point could be an outlier. The R-codes below can be used to find out the potential outliers. What are the values of df.ti and nnM? If you do not know the sample size, you can pretend the sample size is 160.
ti=rstudent(m1)
pp = 2*(1-pt(ti,df=df.ti))
alpha .v <- .05/nnM
which(pp
3. We focus on the model ’m2’ with Y =
√fair and with the following variables as predictors:
race+theft+age+log(income)+region*fire
Based on the diagnostic plots, which data point is removed in the model ’m2.sub’? Is the point an outlier, a high leverage point or both? Please explain.
> cook<-cooks.distance(m2)
> ido<-which.max(cook)
> m2.sub<-lm(fair**.5~race+theft+age+log(income)+region*fire,Fair,
+ subset=-ido)
4. Describe the H0 and HA for the test performed in “anova(m2, m2.newR)”. Please clearly define the parameters used in your H0 and HA .
Feel free to use the following definitions: let β0;A, β0;B, β0;C (intercepts) and β1;A, β1;B, β1;C (slopes), denote the regression intercepts and slopes for ’fire’ at regions, A, B and C, respectively., considering the existence of other covariates in the model ’m2’.
> anova(m2, m2.newR)
Analysis of Variance Table
Model 1: fair^0 .5 ~ race + theft + age + log(income) + region * fire
Model 2: fair^0 .5 ~ race + theft + age + log(income) + newR * fire
Res.Df RSS Df Sum of Sq F Pr(>F)
1 140 5.4446
2 142 5 .4509 -2 -0 .0062269 0 .0801 0 .9231
5. For the ANOVA test above, one argues that we can conduct an equivalent t-test. Is this argument correct? If it is, please describe the exact t-test procedure for a 0 .05 level test to get full credits. If this argument is incorrect, please provide explanations.
6. Suppose we need to reduce the size of model ’m2.newR’. Based on the R-output, will the “testing- based” procedure and the “criteria-based” procedure, AIC, reach the same decision? Please report the decision one would make using each procedure, respectively, and explain how you reach each decision.
7. We let μA , μB , μC denote the means of Y = √fair for the three regions. Please provide a 95% confidence interval for μA and μA - μB , respectively. Note: you can use the notation tdf,γ without knowing its value, where P (X ≥ tdf,γ) = γ , but please do report the “df” and “γ” values you would use for the t df,γ when constructing the confidence intervals.
8. Would we reach a different conclusion here when we conduct all pairwise comparisons among μA , μB , and μC , controlling the family-wise error rate vs. controlling the individual error rate? Please briefly justify your answer.
9. Comparing m3 and m4, could we claim that m4 is a superior model than m4 by comparing their R2 ? Please justify your answer.
10. Based on the R-outputs, please comment on if there is a need to consider the subset removing the point with the largest Cook's distance.
11. Please comment on the output for ’anova(m2.newRa, m2.newR)’ and state what conclusion you make (or would not make) based on the F-test.
> anova(m2.newRa, m2.newR)
Analysis of Variance Table
Model 1: fair^0 .5 ~ theft + poly(age, 2) + newR * fire
Model 2: fair^0 .5 ~ race + theft + age + log(income) + newR * fire
Res.Df RSS Df Sum of Sq F Pr(>F)
1 143 5.9648
2 142 5 .4509 1 0 .5139 13.388 0 .0003562 ***
12. Considering mod.glm, please provide the model that corresponds to mod.glm.