代做STAT3011 Graphical Data Analysis Assignment 2代写留学生R程序

2025-05-19 代做STAT3011 Graphical Data Analysis Assignment 2代写留学生R程序

STAT3011 Graphical Data Analysis Assignment 2

The Project:

Project is in two parts:

Part A Presentation Graphics - 20% of your total grade. No page limit but suggest 5 single-sided pages.

Task:

Collect five statistical graphics from published sources throughout the semester. Provide written critique on each.

Minimum Standard: Graphics must be sourced externally from published materials. Credit will be awarded for carefully selected, insightful graphics that reflect a strong understanding of effective data communication and the principles behind conveying information meaningfully.

For each graphic, include:

- A copy of the graphic.

- Full citation details (article title, authors, source, page numbers, etc.).

- A concise discussion on the purpose, strengths, weaknesses, and potential improvements of the graphic. Redrawing improved versions is encouraged but optional.

Commentary Style.: Brief, relevant, and insightful—avoid unnecessary length.

Part B Analysis Graphics & Data Analysis: 40% of your total grade. 8 single-sided page limit.

Task: Select one data set from two options that are provided. Analyse the data set, and prepare a concise, well-organised and insightful report.

Report Content:

- Begin with a clear problem statement and purpose of your analysis.

- Explain your methodology, detailing the rationale behind each approach chosen to solve the problem and achieve the project’s objectives.

- Incorporate relevant graphics that directly support and illustrate each key insight, providing clear interpretation and analysis within the report.

Focus: The analysis should primarily (though not exclusively) be graphical.

Key Points to Consider:

- Clarity and Insight: Ensure your report and commentary are clear, focused, and demonstrate critical thinking. Overly verbose or unfocused submissions may be penalised.

- Depth of Analysis: Highlight meaningful patterns, trends, and insights rather than just describing the visuals. Your work should reflect careful consideration and thoughtful interpretation.

- Length Limit: Part B must not exceed 8 single-sided pages (not including the declaration page, or R code in the appendix).

- Submissions that exceed this limit will only have the first 8 single-sided pages assessed. Attempts to bypass the page limit with small fonts or unreadable formatting will not be accepted.

- Appendices: Include the R code used to generate graphics in Part B of the project as an appendix. Ensure the code is well-organised, commented, and clearly corresponds to the visuals presented in the report.

Submission Details: Via Turitin

Additional comments:

There is no "right answer" to any question you formulate but there are

certainly clever problems to solve and clever insights to derive, and mundane ones.

I have not analysed these data sets so can only be of limited assistance to you. Of course, I will do my best to help you.

ALSO, YOU MAY NOT DISCUSS THE PROJECT WITH OTHER MEMBERS OF THE CLASS OR ANYBODY ELSE!! This requirement is serious, and evidence of plagiarism will result in you FAILING the course.

PLEASE, play by the rules.

The written part of the report should be no longer than 4-5 pages in length.

Begin with a clear statement of which data set you are analysing and

what problem you are trying solve in your analysis.

Describe the various steps in your analysis. Explore why you have done what you have done, and communicate your most insightful observations at each step and summarise your most insightful lessons.

Finally, there should be a brief conclusion in which you

summarise what you have found from all the insights you have gained through your analysis.

You should restrict the number of graphics you present though of course, you may describe graphics you have constructed but not actually selected for presentation.

If you choose Option 2 (insurance data set) below, the 8 pages includes any maps you might display, so 8 pages is a hard limit. Pages beyond the 8th will not be read.

Marks are available for flair, creativity and those difficult to define

aspects of the project.

Choose ONE data set below for Part B

The project data is in the following R objects in the class data file:

Option 1: otter;

Option 2: insure.

Option 1: Social Grooming in North American River Otters

As part of a large study on the social behaviour of Lutra canadensis,

data on the grooming behaviour of five groups of captive otters was

obtained.  It is generally believed that grooming is the social cement

of animal groups and plays an important role in bonding.

The questions of interest include:

1) Do animals within a group groom equally or are some groomed more

than they groom others?

2) In multi-member groups (A and H) do individuals exhibit preferences

in who they groom?

3) Do females groom males more than males groom females?

4) Do grooming rates change in the breeding season?

The data provided identifies the group, whether the season is breeding

(B) or not (N), the time in minutes of observation, the animals involved

and the frequency of grooming. The groups are

A: F1 (adult female)  M2, M3, M4 (adult males)

B: F7 (adult female)  M8 (adult male)

C: F9 (adult female)  M15 (adult male)

D: F5 (adult female)  M6 (adult male) siblings

H: F21 (subadult female) F22(young adult female)

M23 (subadult male) M24 (young adult male)

The data is in a list called otter. Each component is a vector of

length 394.  $group is the group, $season is the season, $time is the

time observed in minutes (it is the length of time the groups are watched, NOT the length of time they spend grooming), $groomer is the groomer, $groomee is the groomee and $frequency is the frequency of grooming (number of grooms observed).

Project Option 2: Insurance availability in Chicago

The U.S. Commission on Civil Rights collected data in an attempt to

examine charges that insurance companies were "redlining" certain

neighbourhoods.  i.e.  cancelling and/or refusing to renew policies.

The data provided include the number of cancellations, nonrenewals, new

policies and renewals of home and fire policies for each neighbourhood

by zip code for the months December 1977 - February 1978.  This

information is combined into a single variable denoted Voluntary market

activity which is the number of new policies and renewals minus the

number of cancellations and nonrenewals expressed per 100 housing

units.  In addition, information on the number of FAIR plan policies

was obtained.  These policies are obtained after applicants have been

rejected for other policies so this information also reflects the

availability of policies.  This information is provided as the

involuntary market activity, the number of FAIR plan policies and

renewals per 100 housing units.  In addition, the Chicago Police

provided theft data and the Fire Department provided fire data from

1975 for each neighbourhood.  These data are the number of incidents

per 1000 housing units in 1975.  (The insurance companies claim to use

a three year lag on crime data when they set their premiums.) Finally,

the Census Bureau provide data on the racial composition (in per cent

minority), income and the age of housing units.  The income is the

median family income and the age is coded as the percentage of units

built in or before 1939.

The objectives of the study are to explore the extent to which racial

composition and age of housing affect underwriting practices after

controlling for factors like fire and theft.

So do neighbourhood attributes such as racial composition and age of housing explain variation in insurance policies? Therefore, does the data suggest that insurance companies may be engaging in redlining?

The data is provided in a 47x8 data matrix called insure.  A map of the

neighbourhoods with their zip codes is available as a pdf file.