Smart Industry Operations 2024-2025
Individual (Repair) Assignment
Bias in healthcare operations
Context of the Assignment
Diagnosing severe schizophrenia is important for individuals, families, and healthcare systems. It allows to address the condition effectively and improve outcomes. There is evidence that clinicians may overemphasise psychotic symptoms or underemphasise depressive systems in black African Americans in the US. This can be attributed to diagnostic bias. Misdiagnosis may have harmful effects, leading to healthcare provision disparities.
The context of this assignment is to study the extent to which a correct categorisation of diagnostic cases between schizophrenia and depression cases can be achieved based on a range of potential diagnostic attributes. Additionally, of further interest is to identify weather there might be diagnostic disparities between sensitive characteristics, such as gender (sex) and race. Such diagnostic disparities may have implications for disparities in healthcare provisions, resulting in healthcare services inequalities. Diagnostic disparities may mask how sensitive attributes intersect with other important factors, such as family expenditure on healthcare for example. If bias exists already within electronic patient records, this may eventually be amplified when machine learning models are operationalised despite being agnostic of the presence of such disparities.
In the considered case, a machine learning model may be considered by a healthcare provider eager to increase efficiency to be sufficiently accurate for operationalisation if overall accuracy exceeds 90%. Your task is to analyse the case data, build machine learning models, and make concrete recommendations based on evidence obtained from your experience with respect to operationalising the use of your machine learning models.
Case Data
We consider gender and race as two sensitive features in a dataset that contains electronic healthcare data of people who have been diagnoses with a condition, but it is uncertain whether the diagnosis should be schizophrenia or depression.
The dataset contains records with the following attributes:
Diagnosis : affective disorder (0) or schizophrenia (1). This is the output attribute. The rest are input attributes and include:
Sensitive features :
Sex: takes value Male or Female
Race: takes values Asian, Hispanic, Black, White Psychosocial features:
Delay: denoting delay in seeking care (takes values Yes/No)
Housing: takes values Stable or Unstable, denoting the housing status of the individual
In the attributes below, a clinician may be from different disciplines or may refer to rating by different types of clinicians; for simplicity here a single type of clinician is mentioned.
Anhedonia: clinical assessment indicating inability to experience enjoyment in activities typically perceived as enjoyable or fulfilling; rated by a clinician.
Dep_Mood: clinical assessment indicating persistent state of sadness, low energy, or emotional heaviness; rated by a clinician.
Sleep: average hours of sleep per day; rated by the patient.
Tired: whether the patient feels tired or not; rated by the patient.
Appetite: the extent to which the patient has good appetite; rated by the patient.
Rumination: the extent to which the patient is trapped in same thoughts; rated by a clinician. Concentration: the ability of a patient to concentrate; rated by a clinician.
Psychomotor: the extent to which abnormalities in how a patient’s movement follow a thought/mental process; assessed via standardised tests via a clinician.
Delusion: the extent to which false beliefs feature in a patient’s thinking; rated by a clinician.
Suspicious: the extent to which a patient is unreasonably over-suspicious and distrusts others; rated by a clinician.
Withdrawal: the extent to which a patient is in a state of disengagement with social interaction and activities; rated by a clinician.
Passive: the extent to which a patient feels a lack of control over own thoughts; rated by a clinician.
Tension: the extent to which a patient is in a state of unease, strain or agitation; feels a lack of control over own thoughts; rated by a clinician.
Unusual_Thought: the extent to which a patient have thoughts, beliefs or perceptions that significantly deviate from what is typically expected in a given social or cultural context; rated by a clinician.
The provided datasets are:
A. diagnosis_train.csv
A populated with all above information. You will use this for your analysis and for training classification models.
B. diagnosis_predict.csv
This is a similar dataset to A, but on this one you have no access to the outcome. You will use this to classify unknown cases to schizophrenia or depression.
Assignment Questions
A.1. Exploratory Data Analysis (15% of Repair Assignment mark)
In this part you are expected to:
A1.1. Explore the variables, their types, and their basic statistics.
A1.2. Analyse further the data regarding data distributions, range of values, existence of outliers and correlations between attributes, as well as between input attributes and Diagnosis. Which are your observations? Additionally, to what extent is the dataset balanced regarding the different categories of the sex and race sensitive attributes? To what extent is the dataset balanced regarding the diagnosis per different categories of the sex and race sensitive attributes?
A.2. Classification (40% of Repair Assignment mark)
In this part you are expected to develop classifier models. You will have to consider how best to use your training data (diagnosis_train.csv) and you are asked to apply the developed models to the “diagnosis_predict.csv” data at the end and produce diagnosis predictions for them.
A2.1. Apply a decision tree classifier, choosing different hyperparameters (as a minimum, different tree depths) on the diagnosis_train.csv. Motivate your solution analysis in relation to overfit and generalization. Report and analyse performance using different performance metrics. Analyse your findings. Do you observe any difference on performance for different sensitive attributes categories (e.g. sex, race)? Finally, choose a developed model and apply it to the diagnosis_predict.csv data to produce your diagnosis predictions.
A2.2. Apply a random forest classifier, choosing different hyperparameters (as a minimum, different number of estimators and tree depths). Motivate your solution and analysis in relation to overfit and generalization. Report and analyse performance using different performance metrics. Analyse your findings. Do you observe any difference on performance for different sensitive attributes categories (e.g. sex, race)? Finally, choose a developed model and apply it to the diagnosis_predict.csv data to produce your predictions.
A2.3. Make a comparative analysis across all classifier experiments. Make a reasoned choice of a classifier to select and motivate the choice referring to the evidence obtained from performance metrics.
A.3. Bias Analysis and Management (35% of Repair Assignment mark)
In this part you are expected to further analyse the data and the results you obtained regarding potential bias. Specifically, answer the following questions:
A3.1. Consider your results above. For which combination of sensitive attributes (sex, race) did you observe the largest diagnostic disparity (meaning largest difference between precision and recall)? And for which combination did you observe the smallest diagnostic disparity?
A3.2. Now choose the combination with the largest diagnostic disparity. Out of your training data, retain only the data corresponding to this combination of sensitive attributes. Build the same type of model as in A2.1 and A2.2 using only these data and perform similar analysis as in A2.1 and A2.2Which are your observations and how you interpret your results?
A3.3. Apply resampling of the data records for the selected combination of sensitive attributes to balance precision and recall. Perform the same machine learning as in A3.2. Report and analyse results. Which are the observed differences with respect to diagnostic disparity?
A3.4. Without applying resampling, can you think and apply an alternative method to improve the diagnostic disparity?
A.4. Overall comparisons and analysis (10% of Repair Assignment mark)
In this part you are expected to:
Discuss comparatively the obtained results highlight only what you see as most interesting regarding the obtained performance and/or aspects of data unbalance, and fairness, motivating your analysis on the basis of the obtained evidence. What would be your concluding recommendations?.
Further Instructions
In this assignment you will address the questions provided. The assignment is delivered as a Jupyter Notebook. Jupyter Notebooks do not have pre-specified length. However a good Notebook should be at the same time sufficiently explanatory and relatively compact. It should include insightful motivation, analysis and interpretations, grounded on evidence from the data, the processing of the data you performed, and results you have obtained.
You should submit this assignment via email to:
[email protected]
You should receive receipt confirmation within 24 hours - if not this might indicate that your assignment was not received, in which case please submit again or enquire about it.
Submission deadline: 29 January 23:59