代做BEES2041 Data Analysis for Life and Earth Scientists Practical report 1代做R语言

2025-04-02 代做BEES2041 Data Analysis for Life and Earth Scientists Practical report 1代做R语言

BEES2041 Data Analysis for Life and Earth Scientists

Practical report 1

Reproducible Research

Summary

Assessment title: Practical Report 1 Reproducible Research

Weighting: 20%

Due Date: Week 5, 11:59 pm, Friday 21st March 2025

Group work: No

Length: Document up to 2000 words with up to 5 figures and tables, plus a video up to 3 minutes long.

Submission requirements: You need to submit three files, as per instructions below.

Feedback Details: written feedback will be provided on the returned work 2 weeks after the submission.

Aligned CLOs: 3,4,5

Rationale

Many scientific questions can be addressed using available data. This exercise exposes students to two common open-source data resources used in the biological, earth and environmental sciences: records of species observations and records of climate.

Robust science requires the sharing of data sets and the code required to process, visualise, and analyse the data. When groups of researchers are working on the same problem, they also need to share their work prior to the study’s completion and publication. To give you experience in producing a document that would allow open sharing of all analytical methods, you will prepare notes and code that would detail each of the steps required to complete a data analysis.

Assessment details

Use of open resources to study climate preferences of plant

You are working in a team investigating the climatic preferences of plant species. By climatic preferences, we mean the climate where these species typically occur in the wild. Your team’s task is to quantify the average annual rainfall experienced by each species in a genus and then answer the question of whether species differ in the climates where they occur.

To answer this question, your colleague, Data Dan, has assembled a dataset with all known observations for plant species in Australia and the climate where each observation was recorded. To build this, Dan used two great open-source datasets:

1. Observation records: The Atlas of Living Australia is an online repository collating data about distribution of plants, animals, and fungi in Australia. Data are collated from a wide variety of sources, including plot surveys, herbarium records, and citizen science projects. These data are then contributed into GBIF – the Global Biodiversity Information Facility. Read more a https://www.ala.org.au and https://www.gbif.org/country/AU/about

2. Worldclim: A database of high spatial resolution global weather and climate data.

https://www.worldclim.org/data/bioclim.html

The resulting dataset was very large, so Dan randomly selected up to 100 observations per species. Data Dan is now handing the dataset to you to do the analysis. The data is in the folder `data`, with variables as described in the file `*-metadata.csv`.

Your task is to select a single genus of plants, then using the dataset provided, and answer the following

1. Estimate a mean and confidence interval for the average rainfall experienced by each species.

2. Test whether species in the genus differ in their climate preference, measured as annual rainfall, and by how much?

Please choose a genus in the dataset that somehow relates to your name. E.g. sounds similar or starts with the same letter.

Details: Create a new project notebook (qmd or rmd file) in RStudio that contains code to fully execute your analysis, along with text to describe your analysis. Your report should include sections that

a) Explain the motivation for the study and question being asked.

b) Explain what genus you selected and how that relates to your name.

c) Load any R packages that you require.

d) Import the data set, extract relevant parts, and describe the data you are working with.

e) Check for and remove any errors in the data.

f) Explain any statistical tests run and interpret the results.

g) Code to create graphs that visualise how the data addresses the question.

h) Text to describe any limitations you perceive in this analysis.

Assessment

You are required to submit three files

1) RStudio notebook (.qmd or .rmd) file, containing the notes and code completing you entire analysis, from beginning to end

2) An html report generated from running your notebook file, and

3) A short video (up to 3mins) with a screen capture, where you explain how the code works.

Upload those files in Moodle (found in the Assessment section)

Further details will be provided on how this will be assessed.

A reminder, the work submitted must be entirely your own work.

Marking criteria

Data organisation (10)

- Successfully imports the data

- Correctly identifies relevant variables

- Identifies any errors in dataset and takes appropriate action

Suitable statistical analysis (40)

- Chooses appropriate statistical analysis to address the question and explains this choice

- Successful fitting of model using appropriate variables

- Extracts and presents relevant parameters and/or statistics of model

- Appropriate interpretation of model, clear translation from statistical methods to biological context

Effective use of figures to communicate results (20)

- Chooses suitable graph types to display methods, data and/or results

- Figures are easy to engage with: well structured, clear labels, caption; effective use of colour or symbols where appropriate

- Figures suitably integrated with text

Presentation (15)

- Informative yet succinct text presenting material

- Includes sections providing details on introduction, methods, results

- Appropriate use of subheadings

- Follows length guidelines. No more than 8 paragraphs or 2000 words.

- Note: Discussion section NOT required

Code (10)

- Suitable choice of packages, functions and methods; brief text to explain these choices

- Code is well structured, with comments explaining the purpose of specific lines where this is not immediately obvious from the code

- Code successfully renders to produce html document

- Best responses will be succinct and may take extra steps to hide unnecessary detail or outputs.

Video explanation of code (15)

- Briefly explain these 3 sections of your code: Data organisation, Statistical analysis, Creation of figures. For each section, choose 1-3 lines of code and explain how this works.

- Maximum 3 minutes