代做STATS 513 Winter, 2024 Exam #2 (Practice Exam for W25)代写留学生Matlab程序

2025-04-23

STATS 513

Winter, 2024

Exam #2 (Practice Exam for W25)

In this exam, we use parts of a dataset to study the new FAIR policies/renewals per 100 housing units collected from diﬀerent regions. The variables are:

• fair: new FAIR policies/renewals per 100 housing units

• race: racial composition in percentage of minority

• ﬁre: ﬁres per 100 housing units

• theft: thefts per 1000 population

• age: percentage of housing units with age 25 years or older

• income: median family income in thousands of dollars

• region: diﬀerent regions under study

1. A Q-Q plot (normal quantile plot) of ‘ﬁre’ is shown in Figure 1. Please describe the distribution of the variable ’ﬁre’ based on the plot: is it close to normal, long-tailed, short-tailed, skewed to the right, skewed to the left?

2. suppose we like to use the Jackknife residuals to identify which data point could be an outlier. The R-codes below can be used to ﬁnd out the potential outliers. What are the values of df.ti and nnM? If you do not know the sample size, you can pretend the sample size is 160.

ti=rstudent(m1)

pp = 2*(1-pt(ti,df=df.ti))

alpha .v <- .05/nnM

which(pp

3. We focus on the model ’m2’ with Y = √fair and with the following variables as predictors:

race+theft+age+log(income)+region*fire

Based on the diagnostic plots, which data point is removed in the model ’m2.sub’? Is the point an outlier, a high leverage point or both? Please explain.

> cook<-cooks.distance(m2)

> ido<-which.max(cook)

> m2.sub<-lm(fair**.5~race+theft+age+log(income)+region*fire,Fair,

+ subset=-ido)

4. Describe the H0 and HA for the test performed in “anova(m2, m2.newR)”. Please clearly deﬁne the parameters used in your H0 and HA .

Feel free to use the following deﬁnitions: let β0;A, β0;B, β0;C (intercepts) and β1;A, β1;B, β1;C (slopes), denote the regression intercepts and slopes for ’ﬁre’ at regions, A, B and C, respectively., considering the existence of other covariates in the model ’m2’.

> anova(m2, m2.newR)

Analysis of Variance Table

Model 1: fair^0 .5 ~ race + theft + age + log(income) + region * fire

Model 2: fair^0 .5 ~ race + theft + age + log(income) + newR * fire

Res.Df RSS Df Sum of Sq F Pr(>F)

1 140 5.4446

2 142 5 .4509 -2 -0 .0062269 0 .0801 0 .9231

5. For the ANOVA test above, one argues that we can conduct an equivalent t-test. Is this argument correct? If it is, please describe the exact t-test procedure for a 0 .05 level test to get full credits. If this argument is incorrect, please provide explanations.

6. Suppose we need to reduce the size of model ’m2.newR’. Based on the R-output, will the “testing- based” procedure and the “criteria-based” procedure, AIC, reach the same decision? Please report the decision one would make using each procedure, respectively, and explain how you reach each decision.

7. We let μA , μB , μC denote the means of Y = √fair for the three regions. Please provide a 95% conﬁdence interval for μA and μA - μB , respectively. Note: you can use the notation tdf,γ without knowing its value, where P (X ≥ tdf,γ) = γ , but please do report the “df” and “γ” values you would use for the t df,γ when constructing the conﬁdence intervals.

8. Would we reach a diﬀerent conclusion here when we conduct all pairwise comparisons among μA , μB , and μC , controlling the family-wise error rate vs. controlling the individual error rate? Please brieﬂy justify your answer.

9. Comparing m3 and m4, could we claim that m4 is a superior model than m4 by comparing their R2 ? Please justify your answer.

10. Based on the R-outputs, please comment on if there is a need to consider the subset removing the point with the largest Cook's distance.

11. Please comment on the output for ’anova(m2.newRa, m2.newR)’ and state what conclusion you make (or would not make) based on the F-test.

> anova(m2.newRa, m2.newR)

Analysis of Variance Table

Model 1: fair^0 .5 ~ theft + poly(age, 2) + newR * fire

Model 2: fair^0 .5 ~ race + theft + age + log(income) + newR * fire

Res.Df RSS Df Sum of Sq F Pr(>F)

1 143 5.9648

2 142 5 .4509 1 0 .5139 13.388 0 .0003562 ***

12. Considering mod.glm, please provide the model that corresponds to mod.glm.

上一篇
PROG2007代做、代写Python设计程序

下一篇
代写COMPX341-25A Assignment Two: Test Plan Specification代做回归

上一篇：PROG2007代做、代写Python设计程序

下一篇：代写COMPX341-25A Assignment Two: Test Plan Specification代做回归

热门代写

热门标签

代做STATS 513 Winter, 2024 Exam #2 (Practice Exam for W25)代写留学生Matlab程序

代写留学生网