代写MFIN7034 Problem Set 3 – Risk Analysis代做留学生R程序

2025-03-03

MFIN7034 Problem Set 3 – Risk Analysis

Version: 2025/02/25

Due Date: 2025/03/04 23:55:00 UTC+8

This problem set aims to provide some experience applying machine learning methods in risk analysis. The dataset “credit_risk.csv” is available to you on Moodle. Your main task is to establish machine learning models that predict the default label using available information (covariates).

A table of variable explanations is provided here:

Variable Name	Note	Explanations
age	Age of borrower	Age in number of years
edu	Education level	0: below high school, 1: high school, 2: college, 3: master, 4: above master
gender	Gender	0: female, 1: male
housing	Housing ownership	0: not own, 1: own
income	Income	Monthly income-level
job_occupation	Job type	0: unemployed/temporarily employed, 1: employed, 2: manager/senior worker
past_bad_credit	Historical default label	0: non-default, 1:default
married	Marital status	0: unmarried, 1: married
default_label	Default indicator	0: non-default, 1:default

Submission format: .ipynb notebook with runnable code and all the steps shown, and a PDF report. The final report should contain results generated by your program. Simple, presentable, coherent English, clean graphs. Proper visualization and clear interpretations & discussions, such as explaining why a factor can predict default or what your logic is in pursuing higher AUC, will also be graded.

1. Machine Learning Trials (60 Marks)

The first part of this problem set contains three practical tasks for machine learning algorithm applications:

1.1 Logistic Model (25 Marks)

Run logistic regression: regress default label on available variables. Besides the original variables, also try to add more interaction term variables and/or non-linear transformation variables (polynomials, log transformations, dummy variables, etc.) to the model. Summarize your result. Obtain prediction values in the regression above. Compute and plot the ROC curve. Compute AUC value. Explain your main results. Also compare the AUC performance from different model specifications. Briefly discuss the outcomes.

1.2 SVM/Random Forest (15 Marks)

You might wonder whether non-linearity in model specifications can help. Try SVM or Random Forests method. You can select either one. Then, report the key parameters of your model, the AUC value, and the ROC plot as your main result.

1.3 LightGBM (20 Marks)

LightGBM has been one of the most popular gradient boosting algorithms since it was developed. This algorithm is very popular on Kaggle and also productive in the real-world production scenarios. Try LightGBM method. Describe the procedure in detail, such as data preprocessing, model specification, feature selection and hyper-parameter tuning. Report the AUC value and plot the ROC curve. Compare this model’s performance with outcomes in the previous two questions.

2. Deeper Explorations (40 Marks)

Think deeper, ask further, and explore more:

2.1 Data Preprocessing (15 Marks)

Introduce the detailed target for the step-by-step data preprocessing procedures towards Logistic model and LightGBM model respectively. Note that the prodecures should match with your code in Question 1.1 and 1.3. An example answer would be in the following format:

For Logitsit model:

…: …;

Standardization: In order to …

…: …

For LightGBM model: …

2.2 Feature Importance Analysis (15 Marks)

For each model you use in Question 1.1, 1.2 and 1.3, list one model-dependent method to provide feature importance measurements for the feature inputs. Also use the nominated method to output the feature importance ranking for the top 5 features. You will produce a table like (as an example):

	1st	2nd	3rd	4th	5th
Logistic	age	edu	…	…	…
SVM/Random Forest (the one you used)	…	…	…	…	…
LightGBM	age	edu	…	…	…

2.3 Go Deeper towards Feature Importance Analysis! (10 Marks)

Do you think there could be any method that can apply to all above four models (i.e., Logistic regression, SVM, random forest, LightGBM)? Please discuss your idea and thoughts. The mark of this question will be given very generously, so if your answer is yes, just give a try and show what you can get!

上一篇
代做Investigating the Presence of the Golden Ratio in Architectural Landmarks代写Java编程

下一篇
代做ECON 513 Spring 2025 Problem Set 3代写Web开发

上一篇：代做Investigating the Presence of the Golden Ratio in Architectural Landmarks代写Java编程

下一篇：代做ECON 513 Spring 2025 Problem Set 3代写Web开发