Assignment 2 - Video Presentation (35%)
Principal Component Analysis
TASK
For your video presentation, you must demonstrate your PCA analysis on the continuous features of the WACY-COM dataset and interpret the results. Submit the recording via the Panopto link on Canvas. Please ensure you follow the instructions carefully.
The due date for this assessment is Friday of Week 6 on 4 April 2025 before midnight.
Perform PCA and Visualise Data
(i) First, copy the code below to a R script. Enter your student ID into the command set.seed(.) and run the whole code. The code will create a sub-sample of 400 that is unique to you.
#You may need to change/include the path of your working directory
#Import the dataset into R Studio.
dat <- read.csv("WACY-COM.csv", na.strings=NA, stringsAsFactors=TRUE) set.seed(Enter your student ID here)
#Randomly select 400 rows
selected.rows <- sample(1:nrow(dat),size=400,replace=FALSE)
#Your sub-sample of 400 observations
mydata <- dat[selected.rows,]
dim(mydata) #check the dimension of your sub-sample
(ii) Extract only the continuous features and the APT feature from the WACY-COM dataset and store them as a data frame/tibble. Refer to Assignment 1 for the feature description if needed.
(iii) Clean the extracted data based on the feedback received from Assignment 1.
(iv) Remove the incomplete cases to make it usable in “R” for PCA.
(v) Perform PCA using prcomp(.) in R, but only on the numeric features (i.e. ignore APT in this step).
- Explain why you believe the data should or should not be scaled, i.e. standardised, when performing PCA.
- Display and describe the individual and cumulative proportions of variance (3 decimal places) explained by each of the principal components.
- Outline how many principal components are adequate to explain at least 50% of the variability in your data.
- Display and interpret the coefficients (or loadings) to 3 decimal places for PC1, PC2 and PC3. Describe which features (based on the loadings) are the key drivers for each of these three principal components.
(vi) Create and display the biplot for PC1 vs. PC2 to visualise the PCA results in the first two dimensions. Colour-code the points based on the APT feature. Explain the biplot by commenting on the PCA plot and the loadings plot individually, and then both plots combined (see Slides 28-29 of Module 3 notes). Finally, comment on and justify which (if any) features can help distinguish APT activity.
(vii) Based on the results from parts (v) and (vi), describe
- whether PC1 or PC2 (choose one) best assists in classifying APT. Hint: Project all points in the PCA plot onto the PC1 axis (i.e. consider the PC1 scores only) and assess whether there is a clear separation between known and unknown APT actors. Then, project onto the PC2 axis (i.e. consider the PC2 scores only) and evaluate whether the separation is better than in PC1. You can access the PCA scores for PC1 and PC2 via mypca$x, assuming mypca contains your PCA results from prcomp(.).
- the key features in this dimension that can drive this process (Hint: based on your decision above, examine the loadings from part (v) of your chosen PC and choose those whose absolute loading (i.e. disregard the sign) is greater than 0.3).
Video Presentation Checklist
1. In your video presentation, you must
a. Run your code corresponding to parts (i) to (vii) above
b. Display the relevant output
c. Interpret the output
2. Your video presentation must include a camera shot of yourself in the video
capture, unless there is an exceptional reason and is supported by a Learning Assessment Plan (LAP). 20% is automatically deducted from your final mark if this is not included in your video presentation. If you choose to record with another application, you must make sure that this feature is included.
3. Your video presentation must be between 4-5 minutes long.
Marking Rubrics
Criteria
|
Fail
<0-49%
|
Pass
50-59%
|
Credit
60-69%
|
Distinction
70-79%
|
High Distinction 80-100%
|
Working Code (7%)
|
Code does not run or contains major flaws, preventing meaningful PCA analysis. Little to no documentation.
|
Code has significant
errors or omissions that affect PCA output. Poor documentation and
some redundancy.
|
Code has a few errors and/or does not fully
achieve intended PCA and relevant analyses. Documentation is
present but could be improved.
|
Code runs with minor
issues but still performs PCA and relevant tasks correctly. Minimal
redundancy and good documentation.
|
Code runs flawlessly,
correctly performs PCA and relevant tasks, and produces meaningful
outputs. No errors, redundant code, or inefficiencies.
|
Interpretation of results (18%)
|
Fails to interpret the PCA results
meaningfully or
provides incorrect conclusions.
|
Interpretation is vague, lacks depth, and/or has major inaccuracies or errors.
|
Provides a basic
interpretation with
some inaccuracies or missing key insights.
|
Provides a strong and mostly accurate
interpretation of PCA results with minor
omissions or inaccuracies.
|
Provides an in-depth,
clear, and accurate
interpretation of PCA
results, including the
significance of principal components and key
loadings. Justifies conclusions with evidence.
|
Presentation skills (7%)
|
The presentation is
unclear. The presenter made an attempt at
expression, but the
pace and tone need
improvement to better engage the audience.
|
The presentation lacks structure. Presenter
made a good attempt, but the expression,
pace, and tone could be improved.
|
The presentation is
understandable and
delivered at a good
pace. However, there is minimal confidence in
the presentation style.
|
Clear and structured
presentation with minor pacing or engagement issues. Presenter was fluent and displayed
good confidence.
|
The presenter was
dynamic, natural, and persuasive, with an
appropriate tone.
Delivery was clear,
confident, and well-
structured, with
effective pacing and
engagement that
maintained a high level of confidence
throughout.
|
Timing (3%)
|
Presentation is less
than 2 minutes or more
than 9 minutes.
|
Presentation is
between 2 and 3
minutes, or between 8 and 9 minutes
|
Presentation is
between 3 and 4
minutes, or between 7 and 8 minutes
|
Presentation is between 6 and 7 minutes
|
Presentation is between 4 and 5 minutes
|