代写25883 AI-driven Compliance, Anomaly and Fraud Detection 2025代写留学生Python程序

2025-04-11 代写25883 AI-driven Compliance, Anomaly and Fraud Detection 2025代写留学生Python程序

25883 AI-driven Compliance, Anomaly and Fraud Detection   2025

Assessment Task 1

Submission

This assessment must be completed in a group of up to two students. At the top of your Jupyter Notebook, include the full names and student IDs of both group members. Only one submission per group is required — please ensure that only one member uploads the assignment, not both.

Please submit your answers by midnight (11:59pm) on Friday, 11th April 2025 via Canvas only. A late penalty of 5% per day for submissions up to 7 calendar days late will be subtracted from the mark (a maximum of 35% penalty). Work submitted after 7 calendar days (on the 8th day or later) will not be marked and the assessment will attract a zero (0) mark.

Your submission should constitute a single Jupyter Notebook with your code, visualisations, and explanations   summarising  your   methodology,   findings,   and   insights   using   Jupyter’s   markup functionality. Clearly identify the parts of the project by sectioning (e.g., using markup section # Question 1, # Question 2, etc...).

You do not need to upload any data files to Canvas. Your code should either:

•    Download data directly from online sources (e.g., using yfinance), or

•    Read from the external data files provided (e.g., the earnings call transcripts available on the subject GitHub page).

Make sure your code clearly shows how the data is loaded, so it can be run and reproduced without manual file uploads. Note: Code that does not compile or produces errors during execution may result in a significant loss of marks, so be sure to test your notebook before submission.

Using GenAIto Support Your Coding

You are encouraged to use Generative AI tools (such as ChatGPT, Claude, or GitHub Copilot) to assist with  your  coding   in  this  assignment.  These  tools  can   help   explain  unfamiliar  code,  suggest improvements, or help troubleshoot errors. You may also copy and paste the sample code provided in class or on GitHub into a GenAI tool to better understand how it works or adapt it to your own analysis.

Best practices for using GenAI include:

•    Ask specific, well-formed questions (e.g., "How do  I detect anomalies in time series using Isolation Forest?")

•    Use GenAI to clarify errors or unfamiliar code blocks, rather than blindly copying outputs

•    Test and validate any code suggestions before integrating them into your notebook

Always understand and explain the code you submit—your ability to interpret and justify your work will be part of your assessment. Remember, GenAI is a powerful support tool, not a replacement for your own reasoning, learning, and interpretation.

Marking

The final page of this document contains the assessment rubric, which outlines how your work will be evaluated. Submissions will be ranked, with the strongest projects placed at the top of the pile. This means your grade is relative to the quality and creativity of other submissions — so aim high and demonstrate your best work.

Good luck — I’m looking forward to seeing your ideas in action!

Question 1: Anomaly Detection in Financial Time Series

Objective: Your task is to design and implement an anomaly detection approach using Python and historical financial time-series data retrieved from the yfinance library. Focus on price, returns, and volume series, or any derived financial indicators (e.g., volatility). Your goal is to uncover unusual or abnormal patterns, such as structural breaks, outliers, regime shifts, or behaviour inconsistent with typical market dynamics.

This is an open-ended empirical task, and you are encouraged to be innovative. There is no single correct approach —  submissions  will  be  ranked  relative  to  peers  or  peer  groups  based  on  originality, correctness, insights, and overall quality of presentation.

Instructions

•    Use the yfinance package to download time-series data for GameStop from 1 Jan 2020 to 1 Jan 2025.

•    Define  what  constitutes  an  “anomaly” in your context (e.g., price spikes, return outliers, structural breaks, volatility bursts, etc).

•    Select and apply appropriate anomaly detection techniques in Python, such as:

o Statistical methods (e.g., z-score, rolling quantiles, change-point detection)

o Clustering-based or distance-based approaches (e.g., k-means, DBSCAN)

o  Machine learning models (e.g., Isolation Forest, One-Class SVM)

•    Repeat the steps above for another asset of your choosing (e.g., stocks, indices, ETFs).

•    Explain and justify your method selection and implementation.

•    Visualise and interpret the anomalies you detect. What do they reveal? Are they associated with market events or structural changes?

Question 2: NLP-based Analysis of Financial Text Data

Objective: Your task is to design and implement a  Natural Language Processing (NLP) workflow to extract and analyse insights from a set of earnings call transcripts. The goal is to automate the summary of the text, detect patterns and irregularities, or strategic signals embedded in financial language, and explore their possible links to compliance issues, market impact, or irregular firm behaviour.

This is an open-ended and exploratory exercise — you are free to define your own approach, provided it is grounded in appropriate NLP methodology and produces insightful, reproducible results. Submissions will be ranked relative to peers or peer groups based on originality, correctness, insights, and overall clarity of presentation.

Instructions

•    You are provided with a sample of earnings call transcripts in the file:

o   \data\EarningsCallTranscript_SingleCompany.txt  (available  via  the  subject GitHub page).

o The transcript. consists of two parts: formal remarks prepared by the senior team and highly scripted and the Q&A section, which is often unexpected and can be surprising to organisers.

•    Define a problem or pattern of interest relevant to the objective. Example questions you might explore:

o Do you detect certain linguistic signals or sentiment shifts between scripted and Q&A sections?

o  Is there any certain topic avoidance?

o Can topics, tone, or complexity of language signal risk, manipulation, or stress?

•    Apply appropriate NLP techniques to extract and analyse insights. These may include:

o Text cleaning, tokenisation, and vectorisation (e.g., TF-IDF, embeddings)

o Sentiment analysis (e.g., lexicon-based or transformer models)

o Topic modelling (e.g., BERTopic, LDA)

o Semantic similarity and clustering

•    Present your findings using clear visualisations and articulate the value of the insights.

Empirical Assignment Rubric

Criteria                     Excellent (Top Quartile)          Proficient (Second

Quartile)

Satisfactory (Third Quartile)

Needs Improvement (Bottom Quartile)

1. Originality and

Innovative and well-reasoned

Sound and appropriate

Approach is standard or

Weak or unclear approach.

Soundness of

approach. Clearly defines the

approach. Clear problem

partially justified. Problem

Poor alignment between

Approach

problem, justifies chosen

methods, and may extend

beyond taught material.

Demonstrates strong

understanding of the data and domain.

definition with reasonable method choices. Mostly

builds on techniques

taught in class. Methods    are appropriate with some depth or customisation.

framing may be vague or

overly reliant on basic

techniques without clear adaptation. Methods are

appropriate but lack depth or customisation.

problem and method. Lacks justification or shows

misunderstanding of key concepts.

2. Correctness and

Code is correct, well-

Code is mostly correct

Code runs but contains

Code is incorrect, does not

Clarity of

structured, readable, and fully

and functional with minor

inefficiencies or

run properly, or lacks clear

Implementation

reproducible. Methods are

implemented as intended

with good use of programming practices.

issues. Implementation is understandable and

logically structured.

inconsistencies. Some  parts may be difficult to follow or not well

explained.

structure and

documentation. Major   conceptual or technical errors present.

3. Insightfulness of

Provides rich, critical analysis

Interprets results

Basic interpretation of

Findings are poorly

Findings

of the results. Interprets

findings clearly and connects them to broader financial or   compliance context.

Demonstrates depth of thought.

appropriately. Connects findings to context but

lacks depth in critical reflection.

results. Insights are

shallow or descriptive

with minimal contextual linkage.

interpreted or missing.

Analysis lacks relevance, depth, or rigour.

4. Coherence and

Discourse and notebook are

Structure is mostly clear

Structure is uneven or

Poorly organised or hard to

Quality of

well-organised, visually clear,

and logical. Visuals are

unclear. Visuals may be

follow. Visuals missing,

Presentation

and easy to follow.

Visualisations are well-

designed and support the narrative effectively.

helpful but could be more polished or better

integrated.

present but lack clarity, context, or labelling.

confusing, or irrelevant.

Overall presentation quality is low.