CCC8015 Generative Artificial Intelligence
Individual Assignment
Due: 23:59 Friday 28 Mar 2025
Important notes
● Point Allocation: This assignment carries a total of 100 points, which will be awarded based on the criteria outlined in the grading rubric. It accounts for 14% of your final score.
● Use of Visual Aids: You are encouraged to enhance your report with diagrams or charts where appropriate. Visual aids should be used to complement and clarify the concepts discussed in your text.
● Citation Requirement: If you refer to or incorporate external information, proper citation is mandatory. This includes any direct quotes, paraphrased information, or data and statistics you include in your report.
● Generative AI Use: The use of generative AI tools is permitted and even encouraged to assist with the creation of your report. However, it is imperative that any content generated by AI is clearly indicated as such in your submission. This transparency is necessary to maintain academic honesty and will be taken into consideration during grading.
● Plagiarism Policy: Originality in your work is crucial. Plagiarism, which includes copying someone else's work without credit, submitting someone else's work as your own, or using generative AI tools to create content without disclosure, will lead to a failing grade for this assignment.
● Similarity Report: The acceptable similarity threshold is 25%. Assignments surpassing this threshold may be flagged for plagiarism. However, if the system primarily detects quotes from questions and references, such matches should be disregarded. We specifically evaluate the similarity of the 'content' to ensure it falls below the 25% threshold.
● Grading Rubric: Assignments will be evaluated based on a clear demonstration of subject mastery, critical thinking and originality, logical organization, writing quality, effective use of visual aids and references, and adherence to assignment guidelines.
● Concerning late submissions for assignments in CCC8015, we are implementing a segmented deduction scheme. For assignments, there will be a 10% deduction for submissions within 3 days, 30% for submissions within a week, and no points awarded for submissions exceeding one week.
● For students with special educational needs (SEN), there will be a one-week extension of the submission deadline (with an additional 20% time allowance).
Deliverable
Assignment Deliverable:
Submission Guidelines: (Report + Google Colab PDF document)
1. Your report should include tables, figures, and a reference list.
2. A title page is not required.
3. Submit a Google Colab PDF document demonstrating your code implementation and data visualizations.
When preparing your assignment, please ensure the following:
1. Answer all the questions
2. The total word count should not exceed 1500 words, excluding the questions, appendix, and references.
3. The 1500-word limit should be distributed across all the questions as needed. There is no requirement for an equal allocation of words to each question.
4. Upload your assignment in either PDF or Word format. Other file formats are not permitted.
5. When submitting your assignment via Turnitin, download the document first and then upload the downloaded file. This will prevent any issues with accessing your assignment for grading.
6. Do not include the questions in your assignment to avoid potential high similarity scores in Turnitin.
7. You are encouraged to use ChatGPT to assist you in completing your assignment.
8. Formatting guidelines:
。 Font: Times New Roman
。 Size: 12
。 Color: Black
。 Spacing: No specific requirements
Background
This assignment aims to leverage generative AI for data analysis. Imagine you are a novice in the field of data science, but you have diligently recorded your daily expenses over the past two years. As a result, you have amassed a dataset detailing your daily expenditures.
Your task is to utilize generative AI for in-depth data analysis, including visualization, insight discovery, and to make recommendations for enhanced future financial planning based on your expenditure patterns over the next year.
In your assignment, you should:
● Analyzing and presenting data in your Google Colab using Python. Communicating your findings and interpretations through written explanations and graphs. Based on your findings, you will be expected to provide recommendations. Show these graphs and findings in the Word documents.
Data
Dataset name: ‘Family_expenditure_dataset_CCC8015.csv’
Definition of each columns
1. Rental Expense: Expenditure related to renting an apartment, with payments due on the last day of each month.
2. Transportation: Includes daily transportation expenses as well as long-distance transportation costs.
3. Food: Expenses encompassing payments for restaurants or any food consumption.
4. Water Expense: Costs associated with water usage in the apartment.
5. Electric Expense: Expenditure on electricity usage within the apartment.
6. Clothing: Expenses for purchasing clothes, occurring in specific months only.
7. Entertainment: Costs related to travel or entertainment activities such as movies or theme parks.
8. Sport: Expenses for exercise-related activities, including fees for courts and equipment.
9. Investment: Allocation of funds for family investments on a monthly basis.
Instructions for loading the CSV to Google Colab:
Step 1: Download the dataset 'Family_expenditure_dataset_CCC8015.csv'.
Step 2: Click on the 'Files' icon and then upload the dataset to the files section.
Step 3: Execute the code in the code cells.
import pandas as pd
# Load the CSV file into a Pandas DataFrame.
df = pd.read_csv('Family_expenditure_dataset_CCC8015.csv')
# Display the first few rows of the DataFrame.
df
Question 1: [25 Marks]
Task 1: Create a pie chart to visualize the proportion of expenses. Generate the code and successfully run it in Google Colab. Display the graph in your Google Colab. [15 Marks]
Task 2: Include the generated pie chart in the report and analyze the data. Offer interpretations or insights. [10 Marks]
Question 2: [25 Marks]
Task 1: Conduct a monthly time series analysis (line chart) of the 'Water Expense' column. Generate the code and successfully run it in Google Colab. Display the graph in your Google Colab. [15 Marks]
Task 2: Include the time series graph in the report and analyze the data trends. Offer interpretations or insights. [10 Marks]
Question 3: [25 Marks]
Task 1: Conduct a monthly time series analysis (line chart) of the 'Electric Expense' column. Generate the code and successfully run it in Google Colab. Display the graph in your Google Colab. [15 Marks]
Task 2: Include the time series graph in the report and analyze the data trends. Offer interpretations or insights. [10 Marks]
Hints: When using ChatGPT to generate code (assuming you have already imported CSV into your Google Colab),
1. AI may assist you in inputting your data (refer to steps 2 and 3). Exclude the code from steps 2 and 3 when running it.
2. The code from steps 4 and 5 should only be included once in Question 2. Remove the code if it is generated again by AI for other questions.
Task 1: Ask ChatGPT to explain the following code and answer the following questions:
4.1: Explain the code snippet provided below. In this question, you should clarify what the input and output are. [10 Marks]
4.2: What is the purpose of the first three lines of the code? Why do we need to set these libraries with shorter terms, for example, pandas as pd? Is the code executable if the first three lines are not included? [15 Marks]
***The green words marked with # in the code represent comments for human readability. These comments are intended to provide hints for answering questions and are not meant for assessing the dataset. Please note that you are not required to include code to load or assess the dataset in this question.***
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load the dataset
# Assuming 'df' contains your dataset with columns 'Transportation' and 'Entertainment'
# Create a scatter plot to visualize the correlation
plt.figure(figsize=(8, 6))
sns.scatterplot(x='Transportation', y='Entertainment', data=df)
plt.title('Scatter Plot of Transportation vs Entertainment')
plt.xlabel('Transportation')
plt.ylabel('Entertainment')
plt.show()