辅导MAST30027: Modern Applied Statistics
2022-09-12
Assignment 2, 2022.Due: 11:59pm Sunday September 11th? This assignment is worth 7% of your total mark.? To get full marks, show your working including 1) R commands and outputs you use, 2)mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers.If you just provide final answers, you will get zero mark.? The assignment you hand in must be typed (except for math formulas), and be submittedusing LMS as a single PDF document only (no other formats allowed). For math formulas,you can take a picture of them. Your answers must be clearly numbered and in the sameorder as the assignment questions.? The LMS will not accept late submissions. It is your responsibility to ensure that yourassignments are submitted correctly and on time, and problems with online submissions arenot a valid excuse for submitting a late or incorrect version of an assignment.? We will mark a selected set of problems. We will select problems worth ≥ 50% of the fullmarks listed.? If you need an extension, please contact the tutor coordinator before the due date withappropriate justification and supporting documents. Late assignments will only be acceptedif you have obtained an extension from the tutor coordinator before the due date. Underno circumstances an assignment will be marked if solutions for it have been released. PleaseDO NOT email the lecturer for extension request.? Also, please read the “Assessments” section in “Subject Overview” page of the LMS.Note: There is no unique answer for this problem. The report for this problemshould be typed. Hand-written report or report including screen-captured R codesor figures won’t be marked. An example report written by a student previous yearhas been posted on LMS.Data: The dataset comes from the Fiji Fertility Survey and shows data on the number of childrenever born to married women of the Indian race classified by duration since their first marriage(grouped in six categories), type of place of residence (Suva, urban, and rural), and educationallevel (classified in four categories: none, lower primary, upper primary, and secondary or higher).The data can be found in the file assignment2 prob1.txt. The dataset has 70 rows representing70 groups of families. Each row has entries for:? duration: marriage duration of mothers in each group (years),? residence: residence of families in each group (Suva, urban, rural),? education: education of mothers in each group (none, lower primary, upper primary, sec-ondary+),? nChildren: number of children ever born in each group (e.g. 4), and? nMother: number of mothers in each group (e.g. 8).1We can summarise data as a table as follows.> data > data$duration > , ordered=TRUE)> data$residence > data$education > ftable(xtabs(cbind(nChildren,nMother) ~ duration + residence + education, data))nChildren nMotherduration residence education0-4 Suva none 4 8lower 24 21upper 38 42sec+ 37 51urban none 14 12lower 23 27upper 41 39sec+ 35 51rural none 60 62lower 98 102upper 104 107sec+ 35 475-9 Suva none 31 10lower 80 30upper 49 24sec+ 38 22urban none 59 13lower 98 37upper 118 44sec+ 48 21rural none 171 70lower 317 117upper 200 81sec+ 47 2110-14 Suva none 49 12lower 99 27upper 58 20sec+ 24 12urban none 75 18lower 143 43upper 105 29sec+ 50 15rural none 364 88lower 546 132upper 197 50sec+ 30 915-19 Suva none 59 14lower 153 31upper 41 13sec+ 11 4urban none 108 23lower 225 42upper 92 20sec+ 19 5rural none 577 114lower 481 86upper 135 30sec+ 2 120-24 Suva none 118 21lower 91 182upper 47 12sec+ 13 5urban none 118 22lower 147 25upper 65 13sec+ 16 3rural none 756 117lower 431 68upper 132 23sec+ 5 225-29 Suva none 310 47lower 182 27upper 43 8sec+ 2 1urban none 300 46lower 338 45upper 98 13sec+ 0 0rural none 1459 195lower 461 59upper 58 10sec+ 0 0Problem: We want to determine which factors (duration, residence, education) and two-wayinteractions are related to the number of children per woman (fertility rate). The observed numberof children ever born in each group (nChildren) depends on the number of mothers (nMother) ineach group. We must take account of the difference in the number of mothers (hint: one of the labproblems shows how to handle this issue). Write a report on the analysis that should summarie thesubstantive conclusions and include the highlights of your analysis: for example, data visualisation,choice of model (e.g., Poisson, binomial, gamma, etc), model fitting and model selection (e.g., usingAIC), diagnostic, check for overdispersion if necessary, and summary/interpretation of your finalmodel.At each step of you analysis, you should write why you do that and your interpretation/conclusion.For example, “I make an interaction plot to see whether there are interactions between X and Y”,show a plot, and “It seems that there are some interaction between X and Y”.