代写FIT5145: Foundations of Data Science Assignment 2: Description Semester 2, 2024代写R编程

2024-09-11 代写FIT5145: Foundations of Data Science Assignment 2: Description Semester 2, 2024代写R编程

Faculty of Information Technology

Semester 2, 2024

FIT5145: Foundations of Data Science

Assignment 2: Description

Due Date: Friday, Week 8 (September 13, 2024), 11:55 PM

The aim of this assignment is to investigate and visualise data using R. It will test your ability to:

1.   Read data files and extract related data from those files;

2.   Clean and process data into the required formats;

3.   Perform. exploratory data analysis and visualisation;

4.   Use basic tools for managing and processing data; and

5.   Communicate your findings in your report.

Assessment Details:

   Assessment Type: Individual Assignment

   Total marks: 15%

●   Due Date: Friday, Week  8 (September  13, 2024),  11:55 PM. Please notice that we do not accept submissions after September 20, 2024 (i.e., 7 days after the due date).

Submission Details:

You will need to submit two separate files (PDF report and RMD file).

1.   A  report in PDF containing your (a) code, (b) answer, and (c) explanation used to answer each question.

(a) code: Please directly convert the RMD file including your codes into the PDF file (Note: Please Knit RMD to HTML and print the HTML as pdf). If you want to use Word or other word processing software to format your submission, please copy your codes from the RMD file and paste into Word (Please do NOT take the screenshots ofyour code).

(b)  answer:  Please  make  sure  to  include  screenshots/images  of  the  code  outputs  and written answers for each question.

(c) explanation: Please explain how you answered each question (i.e.: explaining your codes or summarising your work for each question).

Marks will be assigned to reports based on their correctness and clarity. For example, higher marks will be awarded to reports that include graphs with appropriately labelled axes and sufficient comments for the code.

2.    The RMarkdown  file: Please submit the RMarkdown file that contains your R codes. Your file should contain all the codes, and proper comments. If your work uses new libraries that  have not been introduced in classes, please include instructions on how to get these libraries installed.

Notes:

1.   Whenever a question asks for a certain value, your code should produce the value.  For example, when a question asks for the number of rows contained in a table, your code should print out the number of the rows. Extraction of the answer manually by eye-examination will not earn any marks.

2.   Please  make  sure that you can select and highlight texts in your PDF, as shown below then the turnitin score can be generated properly for your PDF file (we just need the Turnitin score for the PDF file, not the RMD file).

Assignment Task:

With the ever-increasing amount of data collected in various Learning Management Systems (e.g., Moodle and Blackboard) about students ’ learning behaviour and performance in a course, researchers and educators have started to analyse the collected data to gain a better understanding of students ’ learning experience and further explore whether there is any improvement they can make.

The dataset for this assignment is the anonymised Open University Learning Analytics  Dataset (OULAD). OULAD contains data about courses, students, and their interactions with seven selected  courses  (also called modules) in a Virtual Learning Environment (VLE). Specifically, this dataset consists of:

   courses.csv, which contains the list of all available modules and their presentations;

   assessments.csv, which contains information about assessments in module-presentations;

   vle.csv, which contains information about the available materials in the VLE;

●   studentInfo.csv, which contains demographic information about the students together with their results;

●   studentRegistration.csv, which contains information about the time when the student registered for the module presentation;

●   studentAssessment.csv, which contains the results of students  assessments;

●   studentVle.csv, which contains information about each student’s interactions with the materials in the VLE.

A more detailed description of the dataset is available here. As a pilot step, your task in this assignment is to better understand students ’ demographic background and how well they performed in assessments. Therefore, you only need to focus on the following files for this assignment:

   studentInfo.csv

   assessments.csv

   studentAssessment.csv

These three files have been modified for the task. Please download them from the Assessments page on Moodle.

Firstly, let’s focus on the file studentInfo.csv.

1.   How many unique “region” values are there in the data file?

2.   How  many  student  feedback  comments  mention  interacting  or  engaging  and  provide  an overall score of 1 or 2?

3.   What are the dates of the earliest and latest feedback comments?

4.   For  each  code_module  value,  write  code  to calculate the fractions of students of different gender values (M and F), and then visualise the  fraction numbers specific to the code_module value.

5.   Compute the total number  of students  for each region and highest education, and store the result into a dataset named ‘temp ’. Then, use a single R function/command to display the statistical information (i.e., Min, Max, and Mean) of the total number of students for each highest education in the ‘temp ’ data (Note: You may use multiple functions/commands to prepare  the  pre-processed  data  table,  but  when  you  compute  and  display  the  statistical information, you need to use a single R function/command).

6.   Please investigate the relationship between the student feedback scores and scores  in studentAssessment.csv. Additionally, examine the relationship between the student feedback scores and other variables (Note: You may use variables from both   studentAssessment.csv and assessments.csv to explore these relationships.) Marks will be awarded based on the depth  of your investigation (including the analyses and discussions conducted) and the strength of your reasoning (how insights and conclusions are drawn).

Next, let's focus on the file studentAssessment.csv.

7.   Write    code    to   merge    the   “code_module”    and    “code_presentation”   columns    from assessments.csv  into  studentAssessment.csv.  Then,  generate  the  ‘date_submitted’  column (e.g.:  2014-01-23)    from  the  ‘dates_since_submitted’  column  and  draw  a  chart  to  show student average scores against the date submitted. Note: The start of the module presentation is 2014-01-01.

8.   Write code to only keep assessment data relevant to students from the code_module ofAAA  and code_presentation of 2013J, then write code to add a new column named “score_range” and fill it with one of the following values based on the corresponding “score” value:

。  [0, 60)     。  [60, 70)   。  [70, 80)   。  [80, 100]

For  each  id_assessment  value,  draw  a  chart  to  plot  the  number  of students  of different score_range values. What do you observe?

9.   Now using the original datasets, please draw any charts or create any summary tables which haven't been investigated in the previous questions and discuss them. Marks will be awarded based on the depth of your investigation (including the analyses and discussions conducted) and the strength ofyour reasoning (how insights and conclusions are drawn).