25883 AI-driven Compliance, Anomaly and Fraud Detection 2025
Assessment Task 1
Submission
This assessment must be completed in a group of up to two students. At the top of your Jupyter Notebook, include the full names and student IDs of both group members. Only one submission per group is required — please ensure that only one member uploads the assignment, not both.
Please submit your answers by midnight (11:59pm) on Friday, 11th April 2025 via Canvas only. A late penalty of 5% per day for submissions up to 7 calendar days late will be subtracted from the mark (a maximum of 35% penalty). Work submitted after 7 calendar days (on the 8th day or later) will not be marked and the assessment will attract a zero (0) mark.
Your submission should constitute a single Jupyter Notebook with your code, visualisations, and explanations summarising your methodology, findings, and insights using Jupyter’s markup functionality. Clearly identify the parts of the project by sectioning (e.g., using markup section # Question 1, # Question 2, etc...).
You do not need to upload any data files to Canvas. Your code should either:
• Download data directly from online sources (e.g., using yfinance), or
• Read from the external data files provided (e.g., the earnings call transcripts available on the subject GitHub page).
Make sure your code clearly shows how the data is loaded, so it can be run and reproduced without manual file uploads. Note: Code that does not compile or produces errors during execution may result in a significant loss of marks, so be sure to test your notebook before submission.
Using GenAIto Support Your Coding
You are encouraged to use Generative AI tools (such as ChatGPT, Claude, or GitHub Copilot) to assist with your coding in this assignment. These tools can help explain unfamiliar code, suggest improvements, or help troubleshoot errors. You may also copy and paste the sample code provided in class or on GitHub into a GenAI tool to better understand how it works or adapt it to your own analysis.
Best practices for using GenAI include:
• Ask specific, well-formed questions (e.g., "How do I detect anomalies in time series using Isolation Forest?")
• Use GenAI to clarify errors or unfamiliar code blocks, rather than blindly copying outputs
• Test and validate any code suggestions before integrating them into your notebook
Always understand and explain the code you submit—your ability to interpret and justify your work will be part of your assessment. Remember, GenAI is a powerful support tool, not a replacement for your own reasoning, learning, and interpretation.
Marking
The final page of this document contains the assessment rubric, which outlines how your work will be evaluated. Submissions will be ranked, with the strongest projects placed at the top of the pile. This means your grade is relative to the quality and creativity of other submissions — so aim high and demonstrate your best work.
Good luck — I’m looking forward to seeing your ideas in action!
Question 1: Anomaly Detection in Financial Time Series
Objective: Your task is to design and implement an anomaly detection approach using Python and historical financial time-series data retrieved from the yfinance library. Focus on price, returns, and volume series, or any derived financial indicators (e.g., volatility). Your goal is to uncover unusual or abnormal patterns, such as structural breaks, outliers, regime shifts, or behaviour inconsistent with typical market dynamics.
This is an open-ended empirical task, and you are encouraged to be innovative. There is no single correct approach — submissions will be ranked relative to peers or peer groups based on originality, correctness, insights, and overall quality of presentation.
Instructions
• Use the yfinance package to download time-series data for GameStop from 1 Jan 2020 to 1 Jan 2025.
• Define what constitutes an “anomaly” in your context (e.g., price spikes, return outliers, structural breaks, volatility bursts, etc).
• Select and apply appropriate anomaly detection techniques in Python, such as:
o Statistical methods (e.g., z-score, rolling quantiles, change-point detection)
o Clustering-based or distance-based approaches (e.g., k-means, DBSCAN)
o Machine learning models (e.g., Isolation Forest, One-Class SVM)
• Repeat the steps above for another asset of your choosing (e.g., stocks, indices, ETFs).
• Explain and justify your method selection and implementation.
• Visualise and interpret the anomalies you detect. What do they reveal? Are they associated with market events or structural changes?
Question 2: NLP-based Analysis of Financial Text Data
Objective: Your task is to design and implement a Natural Language Processing (NLP) workflow to extract and analyse insights from a set of earnings call transcripts. The goal is to automate the summary of the text, detect patterns and irregularities, or strategic signals embedded in financial language, and explore their possible links to compliance issues, market impact, or irregular firm behaviour.
This is an open-ended and exploratory exercise — you are free to define your own approach, provided it is grounded in appropriate NLP methodology and produces insightful, reproducible results. Submissions will be ranked relative to peers or peer groups based on originality, correctness, insights, and overall clarity of presentation.
Instructions
• You are provided with a sample of earnings call transcripts in the file:
o \data\EarningsCallTranscript_SingleCompany.txt (available via the subject GitHub page).
o The transcript. consists of two parts: formal remarks prepared by the senior team and highly scripted and the Q&A section, which is often unexpected and can be surprising to organisers.
• Define a problem or pattern of interest relevant to the objective. Example questions you might explore:
o Do you detect certain linguistic signals or sentiment shifts between scripted and Q&A sections?
o Is there any certain topic avoidance?
o Can topics, tone, or complexity of language signal risk, manipulation, or stress?
• Apply appropriate NLP techniques to extract and analyse insights. These may include:
o Text cleaning, tokenisation, and vectorisation (e.g., TF-IDF, embeddings)
o Sentiment analysis (e.g., lexicon-based or transformer models)
o Topic modelling (e.g., BERTopic, LDA)
o Semantic similarity and clustering
• Present your findings using clear visualisations and articulate the value of the insights.
Empirical Assignment Rubric
Criteria Excellent (Top Quartile) Proficient (Second
Quartile)
|
Satisfactory (Third Quartile)
|
Needs Improvement (Bottom Quartile)
|
1. Originality and
|
Innovative and well-reasoned
|
Sound and appropriate
|
Approach is standard or
|
Weak or unclear approach.
|
Soundness of
|
approach. Clearly defines the
|
approach. Clear problem
|
partially justified. Problem
|
Poor alignment between
|
Approach
|
problem, justifies chosen
methods, and may extend
beyond taught material.
Demonstrates strong
understanding of the data and domain.
|
definition with reasonable method choices. Mostly
builds on techniques
taught in class. Methods are appropriate with some depth or customisation.
|
framing may be vague or
overly reliant on basic
techniques without clear adaptation. Methods are
appropriate but lack depth or customisation.
|
problem and method. Lacks justification or shows
misunderstanding of key concepts.
|
2. Correctness and
|
Code is correct, well-
|
Code is mostly correct
|
Code runs but contains
|
Code is incorrect, does not
|
Clarity of
|
structured, readable, and fully
|
and functional with minor
|
inefficiencies or
|
run properly, or lacks clear
|
Implementation
|
reproducible. Methods are
implemented as intended
with good use of programming practices.
|
issues. Implementation is understandable and
logically structured.
|
inconsistencies. Some parts may be difficult to follow or not well
explained.
|
structure and
documentation. Major conceptual or technical errors present.
|
3. Insightfulness of
|
Provides rich, critical analysis
|
Interprets results
|
Basic interpretation of
|
Findings are poorly
|
Findings
|
of the results. Interprets
findings clearly and connects them to broader financial or compliance context.
Demonstrates depth of thought.
|
appropriately. Connects findings to context but
lacks depth in critical reflection.
|
results. Insights are
shallow or descriptive
with minimal contextual linkage.
|
interpreted or missing.
Analysis lacks relevance, depth, or rigour.
|
4. Coherence and
|
Discourse and notebook are
|
Structure is mostly clear
|
Structure is uneven or
|
Poorly organised or hard to
|
Quality of
|
well-organised, visually clear,
|
and logical. Visuals are
|
unclear. Visuals may be
|
follow. Visuals missing,
|
Presentation
|
and easy to follow.
Visualisations are well-
designed and support the narrative effectively.
|
helpful but could be more polished or better
integrated.
|
present but lack clarity, context, or labelling.
|
confusing, or irrelevant.
Overall presentation quality is low.
|