DATA 311 - Data Ethics Assignment 2: Allocative Bias in Healthcare

Scott Wehrwein

Spring 2025

Learning Objectives

By the end of this activity, you will be able to:

Describe one complex mechanism that can lead to bias in the allocation of resources by predictive algorithms.
Reproduce key findings from a published study documenting racial bias in a medical risk prediction algorithm.

Collaboration and Deadlines

You will complete Part A individually and submit before class on Monday 5/4. Parts B and C will be done in pairs, beginning in class, with any remaining tasks completed by the following Monday, 5/11.

Introduction: Allocation and Bias

Automated prediction systems are often used to make or inform decisions that affect people’s lives. Often these decisions relate to the allocation of resources:

Who should receive a job opportunity?
Who should be able to rent a rent-stabilized apartment?
Who should receive intensive medical treatment?

When these algorithms make systematically different allocative recommendations for different groups of people, awarding resources to some while denying those same resources to others for arbitrary reasons, we may have an instance of allocative harm or allocative bias.¹

In this activity we’ll explore a famous case of allocative bias in the distribution of medical care, documented in the journal article Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations by Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan (Science, 2019).

In this study, the authors considered a healthcare recommendation system which assigned a medical risk score to patients; patients with higher scores were considered to face greater future risk to their health. Patients who received very high risk scores were recommended for an intensive care program intended to improve their outcomes. Here, it’s important to remember:

If a patient is indeed at high medical risk, then it is a good outcome for them to receive a high risk score and be recommended for intensive care.

Part A: Read The Article

Take ~30 minutes to read through the article. The PDF version of this article can be directly accessed at this link hosted by the FTC. You can focus on the first four pages of the text, especially including the abstract, the description of the data set, Fig. 1 and surrounding text, and the section “Mechanism of bias.”

In a text editor of your choice that is capable of exporting to pdf, answer these questions:

What entity is the developer of the algorithm studied in the article? What entity is the consumer (paying user) of the algorithm? What entity or entities are the subject of (person(s) affected by) the algorithm?
What happens to a patient who receives a very high risk score?
What is the target variable that the algorithm is trained to predict?
On average, how many chronic illnesses are required for a white patient to be defaulted into the high-risk management program? How many are required for a Black patient?
Does a Black patient with a given risk score from the algorithm tend to have more or fewer chronic illnesses than a white patient with the same risk score?
Does a Black patient with a given risk score from the algorithm tend to have higher or lower medical costs than a white patient with the same risk score?
Please respond to the following statement: “The algorithm studied in this article can’t be racially biased because it is equally accurate in predicting medical costs for Black and white patients.” Explain your reasoning.

Submit your answers in PDF format to the Data Ethics 2 - Reading and Questions assignment on Canvas.

Parts B and C

Download the allocative_bias.ipynb notebook and complete Parts B and C. Details of these tasks are given in the notebook.

Submit your completed notebook to the Data Ethics 2 - Activity assignment on Canvas.

Rubric

Your answers to the questions from Part A are scored out of 4 points, and as usual will be graded on effort, thoughtfulness, and clarity.

Your solutions for Parts B and C are scored out of 6 points, one for each task in the notebook.

Acknowledgement

This assignment was adapted from an assignment generously shared by Phil Chodrow.

See this talk for a good primer on bias in ML and why it’s such a difficult problem: https://www.youtube.com/watch?v=fMym_BKWQzk ↩︎