代写25883 AI-driven Compliance, Anomaly and Fraud Detection 2025代写留学生Python程序

2025-04-11

25883 AI-driven Compliance, Anomaly and Fraud Detection 2025

Assessment Task 1

Submission

This assessment must be completed in a group of up to two students. At the top of your Jupyter Notebook, include the full names and student IDs of both group members. Only one submission per group is required — please ensure that only one member uploads the assignment, not both.

Please submit your answers by midnight (11:59pm) on Friday, 11th April 2025 via Canvas only. A late penalty of 5% per day for submissions up to 7 calendar days late will be subtracted from the mark (a maximum of 35% penalty). Work submitted after 7 calendar days (on the 8th day or later) will not be marked and the assessment will attract a zero (0) mark.

Your submission should constitute a single Jupyter Notebook with your code, visualisations, and explanations summarising your methodology, findings, and insights using Jupyter’s markup functionality. Clearly identify the parts of the project by sectioning (e.g., using markup section # Question 1, # Question 2, etc...).

You do not need to upload any data files to Canvas. Your code should either:

• Download data directly from online sources (e.g., using yfinance), or

• Read from the external data files provided (e.g., the earnings call transcripts available on the subject GitHub page).

Make sure your code clearly shows how the data is loaded, so it can be run and reproduced without manual file uploads. Note: Code that does not compile or produces errors during execution may result in a significant loss of marks, so be sure to test your notebook before submission.

Using GenAIto Support Your Coding

You are encouraged to use Generative AI tools (such as ChatGPT, Claude, or GitHub Copilot) to assist with your coding in this assignment. These tools can help explain unfamiliar code, suggest improvements, or help troubleshoot errors. You may also copy and paste the sample code provided in class or on GitHub into a GenAI tool to better understand how it works or adapt it to your own analysis.

Best practices for using GenAI include:

• Ask specific, well-formed questions (e.g., "How do I detect anomalies in time series using Isolation Forest?")

• Use GenAI to clarify errors or unfamiliar code blocks, rather than blindly copying outputs

• Test and validate any code suggestions before integrating them into your notebook

Always understand and explain the code you submit—your ability to interpret and justify your work will be part of your assessment. Remember, GenAI is a powerful support tool, not a replacement for your own reasoning, learning, and interpretation.

Marking

The final page of this document contains the assessment rubric, which outlines how your work will be evaluated. Submissions will be ranked, with the strongest projects placed at the top of the pile. This means your grade is relative to the quality and creativity of other submissions — so aim high and demonstrate your best work.

Good luck — I’m looking forward to seeing your ideas in action!

Question 1: Anomaly Detection in Financial Time Series

Objective: Your task is to design and implement an anomaly detection approach using Python and historical financial time-series data retrieved from the yfinance library. Focus on price, returns, and volume series, or any derived financial indicators (e.g., volatility). Your goal is to uncover unusual or abnormal patterns, such as structural breaks, outliers, regime shifts, or behaviour inconsistent with typical market dynamics.

This is an open-ended empirical task, and you are encouraged to be innovative. There is no single correct approach — submissions will be ranked relative to peers or peer groups based on originality, correctness, insights, and overall quality of presentation.

Instructions

• Use the yfinance package to download time-series data for GameStop from 1 Jan 2020 to 1 Jan 2025.

• Define what constitutes an “anomaly” in your context (e.g., price spikes, return outliers, structural breaks, volatility bursts, etc).

• Select and apply appropriate anomaly detection techniques in Python, such as:

o Statistical methods (e.g., z-score, rolling quantiles, change-point detection)

o Clustering-based or distance-based approaches (e.g., k-means, DBSCAN)

o Machine learning models (e.g., Isolation Forest, One-Class SVM)

• Repeat the steps above for another asset of your choosing (e.g., stocks, indices, ETFs).

• Explain and justify your method selection and implementation.

• Visualise and interpret the anomalies you detect. What do they reveal? Are they associated with market events or structural changes?

Question 2: NLP-based Analysis of Financial Text Data

Objective: Your task is to design and implement a Natural Language Processing (NLP) workflow to extract and analyse insights from a set of earnings call transcripts. The goal is to automate the summary of the text, detect patterns and irregularities, or strategic signals embedded in financial language, and explore their possible links to compliance issues, market impact, or irregular firm behaviour.

This is an open-ended and exploratory exercise — you are free to define your own approach, provided it is grounded in appropriate NLP methodology and produces insightful, reproducible results. Submissions will be ranked relative to peers or peer groups based on originality, correctness, insights, and overall clarity of presentation.

Instructions

• You are provided with a sample of earnings call transcripts in the file:

o \data\EarningsCallTranscript_SingleCompany.txt (available via the subject GitHub page).

o The transcript. consists of two parts: formal remarks prepared by the senior team and highly scripted and the Q&A section, which is often unexpected and can be surprising to organisers.

• Define a problem or pattern of interest relevant to the objective. Example questions you might explore:

o Do you detect certain linguistic signals or sentiment shifts between scripted and Q&A sections?

o Is there any certain topic avoidance?

o Can topics, tone, or complexity of language signal risk, manipulation, or stress?

• Apply appropriate NLP techniques to extract and analyse insights. These may include:

o Text cleaning, tokenisation, and vectorisation (e.g., TF-IDF, embeddings)

o Sentiment analysis (e.g., lexicon-based or transformer models)

o Topic modelling (e.g., BERTopic, LDA)

o Semantic similarity and clustering

• Present your findings using clear visualisations and articulate the value of the insights.

Empirical Assignment Rubric

Criteria Excellent (Top Quartile) Proficient (Second Quartile)			Satisfactory (Third Quartile)	Needs Improvement (Bottom Quartile)
1. Originality and	Innovative and well-reasoned	Sound and appropriate	Approach is standard or	Weak or unclear approach.
Soundness of	approach. Clearly defines the	approach. Clear problem	partially justified. Problem	Poor alignment between
Approach	problem, justifies chosen methods, and may extend beyond taught material. Demonstrates strong understanding of the data and domain.	definition with reasonable method choices. Mostly builds on techniques taught in class. Methods are appropriate with some depth or customisation.	framing may be vague or overly reliant on basic techniques without clear adaptation. Methods are appropriate but lack depth or customisation.	problem and method. Lacks justification or shows misunderstanding of key concepts.
2. Correctness and	Code is correct, well-	Code is mostly correct	Code runs but contains	Code is incorrect, does not
Clarity of	structured, readable, and fully	and functional with minor	inefficiencies or	run properly, or lacks clear
Implementation	reproducible. Methods are implemented as intended with good use of programming practices.	issues. Implementation is understandable and logically structured.	inconsistencies. Some parts may be difficult to follow or not well explained.	structure and documentation. Major conceptual or technical errors present.
3. Insightfulness of	Provides rich, critical analysis	Interprets results	Basic interpretation of	Findings are poorly
Findings	of the results. Interprets findings clearly and connects them to broader financial or compliance context. Demonstrates depth of thought.	appropriately. Connects findings to context but lacks depth in critical reflection.	results. Insights are shallow or descriptive with minimal contextual linkage.	interpreted or missing. Analysis lacks relevance, depth, or rigour.
4. Coherence and	Discourse and notebook are	Structure is mostly clear	Structure is uneven or	Poorly organised or hard to
Quality of	well-organised, visually clear,	and logical. Visuals are	unclear. Visuals may be	follow. Visuals missing,
Presentation	and easy to follow. Visualisations are well- designed and support the narrative effectively.	helpful but could be more polished or better integrated.	present but lack clarity, context, or labelling.	confusing, or irrelevant. Overall presentation quality is low.