Overcoming Bias in Artificial Intelligence, Machine Learning
Artificial intelligence is often seen as the silver bullet to the healthcare industry’s numerous problems. Machine learning technologies have been shown to more quickly and accurately read radiology scans, identify high-risk patients, and reduce provider’s administrative burden.
But recent studies have revealed the inherent bias perpetuated by using these algorithms in clinical practice, concerning many that these technologies are more harmful than helpful.
Bias creeps in when developers use proxy measures for various health outcomes. End-users are often unaware of this and, as a result, can unintentionally give bias recommendations to patients.
Philip Thomas, PhD, MS, assistant professor at the college of information and computer science at the University of Massachusetts Amherst, said the responsibility of fixing this problem lies in the hands of developers, not end-users.
“We’re proposing the framework. We’re not saying use our algorithm as is,” he explained. “We’re encouraging other machine learning researchers to provide these interfaces that make it easier for medical researchers to apply the algorithms and give an example of how that could happen.”
Practicing what he preaches, Thomas’ team recently created an algorithm for diabetes management that attempted to balance gender fairness with accuracy.
“We ran five or six different definitions of unsafe behaviors. We went to the literature, found several common measures of hypoglycemia severity and asked what a researcher would want to restrict,” he said. “We then showed how our algorithm could enforce safety constraints with respect to these different measures of hypoglycemia severity.”
With the responsibility resting on the shoulders of develops, according to Thomas, they should seek expert advice and understand what measures are important to providers when these algorithms are in clinical practice.
“We’re not promoting how to balance accuracy versus discrimination. We’re not saying what the right definitions of fair or safe are. Our goal is to let the person that’s an expert in that field decide,” Thomas furthered.
Safety and fairness are not the same thing, Thomas pointed out. But end-users need the ability to alter their goals and definitions of care needed based on individual patients at a given point of care.
“It’s important to make it easy for the end-user to put high probability constraints on these algorithms. Leveraging some of the technologies that we built in the reinforcement learning setting for giving high probability guarantees of improvement, we were able to create these algorithms,” he elaborated.
Developers need high-quality data to ensure algorithms have this capability. Artificial intelligence built on low-quality data risks producing undesirable and incorrect results.
“Our algorithms require the data it trains on to resemble the data that it will see when running the algorithm. If you train on data that’s really noisy and has junk in it, you need to have that same noise and junk at test time,” Thomas noted. “It will then be able to enforce the definition of fairness.”
Simply removing bias from the data set is not the solution, Thomas emphasized.
“If you remove bias from your data at test time when you’re deploying the solution, it will make the system unfair,” he continued. “You’d have to re-train the algorithm on data that came from real world distributions with the same errors. Otherwise, your algorithm will essentially be figuring out how to counteract any bias that’s in the training dataset.”
Developers have to balance accuracy with the amount of data available to them. Cleaning the training dataset of any bias is actually unfair to diverse patient populations. If there is a lot of data for one group of individuals compared to another the algorithm will decrease its accuracy for the group there is less information on in order to give equally accurate outputs for each group.
“If you only focus on accuracy, you’re not guaranteed to be fair, even when the data is perfectly synthetically generated,” Thomas elaborated. “Safety is not necessarily the same thing as fairness.”
More data is needed to generate results that are equally accurate and fair. Until then, developers must pay particular attention to the type of data their tools are learning from and provide capabilities that allow providers to leverage what aspects of care are most important to their particular patient.
“When we create the interface to let the user convey what these undesirable behaviors are, it looks the same for safety constraints and fairness constraints,” Thomas argued. “It’s all just undesirable behavior.”
“I view undesirable behavior and bias as the same thing,” he furthered.
Addressing these undesirable behaviors whether it be safety concerns, bias, or accuracy will require developers to work with end-users and give them the capabilities to tweak the algorithm as needed for a variety of different scenarios. Thomas’ work on diabetes has demonstrated this method can work but it must be more widely adopted across the industry if bias is truly to be eliminated from clinical diagnoses.