Study reveals why AI models that analyze medical images can be biased

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analyzing images such as X-rays. However, studies have found that these models don’t always perform well across all demographic groups, usually faring worse on women and people of color.

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient’s race from their chest X-rays — something that the most skilled radiologists can’t do.

That research team has now found that the models that are most accurate at making demographic predictions also show the biggest “fairness gaps” — that is, discrepancies in their ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using “demographic shortcuts” when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

“It’s well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done,” says Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science, a member of MIT’s Institute for Medical Engineering and Science, and the senior author of the study.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approached to “debiasing” worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

“I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data,” says Haoran Zhang, an MIT graduate student and one of the lead authors of the new paper. MIT graduate student Yuzhe Yang is also a lead author of the paper, which appears today in Nature Medicine. Judy Gichoya, an associate professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham Professor of Electrical Engineering and Computer Science at MIT, are also authors of the paper.

Removing bias

As of May 2024, the FDA has approved 882 AI-enabled medical devices, with 671 of them designed to be used in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained on those tasks.

“Many popular machine learning models have superhuman demographic prediction capacity — radiologists cannot detect self-reported race from a chest X-ray,” Ghassemi says. “These are models that are good at predicting disease, but during training are learning to predict other things that may not be desirable.”

In this study, the researchers set out to explore why these models don’t work as well for certain groups. In particular, they wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic attributes to determine whether a medical condition is present, instead of relying on other features of the images.

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. Then, they tested the models on X-rays that were held out from the training data.

Overall, the models performed well, but most of them displayed “fairness gaps” — that is, discrepancies between accuracy rates for men and women, and for white and Black patients.

The models were also able to predict the gender, race, and age of the X-ray subjects. Additionally, there was a significant correlation between each model’s accuracy in making demographic predictions and the size of its fairness gap. This suggests that the models may be using demographic categorizations as a shortcut to make their disease predictions.

The researchers then tried to reduce the fairness gaps using two types of strategies. For one set of models, they trained them to optimize “subgroup robustness,” meaning that the models are rewarded for having better performance on the subgroup for which they have the worst performance, and penalized if their error rate for one group is higher than the others.

In another set of models, the researchers forced them to remove any demographic information from the images, using “group adversarial” approaches. Both strategies worked fairly well, the researchers found.

“For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance,” Ghassemi says. “Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely.”

Not always fairer

However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on — for example, only patients from the Beth Israel Deaconess Medical Center dataset.

When the researchers tested the models that had been “debiased” using the BIDMC data to analyze patients from five other hospital datasets, they found that the models’ overall accuracy remained high, but some of them exhibited large fairness gaps.

“If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location,” Zhang says.

This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

“We found that even state-of-the-art models which are optimally performant in data similar to their training sets are not optimal — that is, they do not make the best trade-off between overall and subgroup performance — in novel settings,” Ghassemi says. “Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely.”

The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups than those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new datasets.

The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren’t giving inaccurate results for certain groups.

The research was funded by a Google Research Scholar Award, the Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program, RSNA Health Disparities, the Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.

The Friday Roundup – Back to Video Basics and Camera Tips

8 Simple Editing Techniques and Concepts To Make Better Videos I thought I would kick off this week’s Friday Roundup with a bit of a “back to basics” tutorial. A lot of the editing tutorials that are kicking around these days seem to be covering a…

Leaning into the immune system’s complexity

At any given time, millions of T cells circulate throughout the human body, looking for potential invaders. Each of those T cells sports a different T cell receptor, which is specialized to recognize a foreign antigen.

To make it easier to understand how that army of T cells recognizes their targets, MIT Associate Professor Michael Birnbaum has developed tools that can be used to study huge numbers of these interactions at the same time.

Deciphering those interactions could eventually help researchers find new ways to reprogram T cells to target specific antigens, such as mutations found in a cancer patient’s tumor.

“T-cells are so diverse in terms of what they recognize and what they do, and there’s been incredible progress in understanding this on an example-by-example basis. Now, we want to be able to understand the entirety of this process with some of the same level of sophistication that we understand the individual pieces. And we think that once we have that understanding, then we can be much better at manipulating it to positively affect disease,” Birnbaum says.

This approach could lead to improvements in immunotherapy to treat cancer, as well as potential new treatments for autoimmune disorders such as type 1 diabetes, or infections such as HIV and Covid-19.

Tackling difficult problems

Birnbaum’s interest in immunology developed early, when he was a high school student in Philadelphia. His school offered a program allowing students to work in research labs in the area, so starting in tenth grade, he did research in an immunology lab at Fox Chase Cancer Center.

“I got exposed to some of the same things I study now, actually, and so that really set me on the path of realizing that this is what I wanted to do,” Birnbaum says.

As an undergraduate at Harvard University, he enrolled in a newly established major known as chemical and physical biology. During an introductory immunology course, Birnbaum was captivated by the complexity and beauty of the immune system. He went on to earn a PhD in immunology at Stanford University, where he began to study how T cells recognize their target antigens.

T cell receptors are protein complexes found on the surfaces of T cells. These receptors are made of gene segments that can be mixed and matched to form up to 1015 different sequences. When a T cell receptor finds a foreign antigen that it recognizes, it signals the T cell to multiply and begin the process of eliminating the cells that display that antigen.

As a graduate student, Birnbaum worked on building tools to study interactions between antigens and T cells at large scales. After finishing his PhD, he spent a year doing a postdoc in a neuroscience lab at Stanford, but quickly realized he wanted to get back to immunology.

In 2016, Birnbaum was hired as a faculty member in MIT’s Department of Biological Engineering and the Koch Institute for Integrative Cancer Research. He was drawn to MIT, he says, by the willingness of scientists and engineers at the Institute to work together to take on difficult, important problems.

“There’s a fearlessness to how people were willing to do that,” he says. “And the community, particularly the immunology community here, was second to none, both in terms of its quality, but also in terms of how supportive it was.”

Billions of targets

At MIT, Birnbaum’s lab focuses on T cell-antigen interactions, with the hope of eventually being able to reprogram those interactions to help fight diseases such as cancer. In 2022, he reported a new technique for analyzing these interactions at large scales.

Until then, most existing tools for studying the immune system were designed to allow for the study of a large pool of antigens exposed to one T cell (or B cell), or a large pool of immune cells encountering a small number of antigens. Birnbaum’s new method uses engineered viruses to present many different antigens to huge populations of immune cells, allowing researchers to screen huge libraries of both antigens and immune cells at the same time.

“The immune system works with millions of unique T cell receptors in each of us, and billions of possible antigen targets,” Birnbaum says. “In order to be able to really understand the immune system at scale, we spend a lot of time trying to build tools that can work at similar scales.”

This approach could enable researchers to eventually screen thousands of antigens against an entire population of B cells and T cells from an individual, which could reveal why some people naturally fight off certain viruses, such as HIV, better than others.

Using this method, Birnbaum also hopes to develop ways to reprogram T cells inside a patient’s body. Currently, T cell reprogramming requires T cells to be removed from a patient, genetically altered, and then reinfused into the patient. All of these steps could be skipped if instead the T cells were reprogrammed using the same viruses that Birnbaum’s screening technology uses. A company called Kelonia, co-founded by Birnbaum, is also working toward this goal.

To model T cell interactions at even larger scales, Birnbaum is now working with collaborators around the world to use artificial intelligence to make computational predictions of T cell-antigen interactions. The research team, which Birnbaum is leading, includes 12 labs from five countries, funded by Cancer Grand Challenges. The researchers hope to build predictive models that may help them design engineered T cells that could help treat many different diseases.

“The program is put together with a focus on whether these types of predictions are possible, but if they are, it could lead to much better understanding of what immunotherapies may work with different people. It could lead to personalized vaccine design, and it could lead to personalized T cell therapy design,” Birnbaum says.

Innatera Secures $21M to Drive Neuromorphic AI to 1 Billion Devices by 2030

Innatera, the trailblazer in ultra-low power neuromorphic processors, has announced the successful closure of an oversubscribed Series A funding round, attracting a cumulative $21 million. This substantial investment includes the initial $16 million secured in March 2024 and an additional $5 million from new investors. Leading…

Go Live Anywhere with LiveU Solo Pro – Videoguys

Step 1: Select your LiveU Solo PRO Encoder
Solo PRO joins the LiveU Solo family of plug-and-play encoders for on-the-go live content. Now with 4K and HEVC video quality and the reliability of 5G connectivity

Step 2: SoloConnect Modem Kits Give You Cellular Connectivity
LiveU Solo Pro is compatible with the new SoloPro Modem Kits for North American and International Streaming Capabilities. The SoloPro Modem Kit are available in a 2 modem pack and a 4 modem package for those who require even more cellular connections.

The LiveU Solo PRO has two USB inputs for the 2 modems. No extra cables needed! Designed for robust performance, LiveU Net modems offer the highest reliability compared to other carrier-branded USB modems. Equipped with LiveU’s managed SIM cards, the modems inherently support multiple cellular carriers and have been successfully tested with all LiveU units.
For the LiveU Solo Pro with the North America or Traveler plans.

A complete bundle for your Solo PRO device with belt pack, modems, and cables. This bundle includes:

  • 4x LU Net 4G Modems
  • 4x Sims U.S. version 
  • LiveU Solo PRO Belt Pack
  • 2x Y Cables
  • HDMI Extension Cable
  • HDMI Tension Clip
  • Getting started card
  • LiveU Solo PRO (HDMI/SDI) Encoder must be purchased separately

Step 3: Activate your Unlimited Data Plans with LiveU using SoloConnect Services
The LiveU Solo PRO Connect 2 Modem Kit must be activated with a North American plan that includes United States, Mexico, and Canada for $295/month. For International usage there is also a Travelers plan for $520/month.

The LiveU Solo PRO Connect 4 Modem Kit must be activated with a North American plan that includes United States, Mexico, and Canada for $435/month. For International usage there is also a Travelers plan for $750/month.


LiveU Solo HDMI/SDI – Now Under $1,000!

Plus FREE SoloConnect 2 Modem Kit Bundle – $450 value!

Designed for you to support clients of all sizes and budgets.

Whether it’s a single or multi-cam production, a fixed per unit cost will provide you with the flexibility to build a solution, by paying only for the events you produce.

You stay in control of your own costs and can scale up or scale down as required, giving you a risk-free way to expand your production capacity. 

Multiple Bundles
Choose the right encoder. Between the LU300S or the LU800, you’ll get the high-quality solution you need

Massive Savings

It’s simple. Pay when you use it. This cost-effective solution allows you to save costs on every production

More Events

The REMI Production solution allows you to produce more events, all from one centralized location.

For more information and case studies of the LiveU Lightweight Production Bundles, watch our Videoguys Live webinar

Dragon Age Cover Story And Shadow of the Erdtree Review | GI Show

In this week’s episode of The Game Informer Show, the crew discusses our recent trip to Bioware for our Dragon Age: The Veilguard cover story, our Elden Ring: Shadow of the Erdtree review, PS5-bound multiplayer shooter, Concord, a new battle royale from former League of Legends developers, atmospheric horror title Still Wakes the Deep, Dustborn, Luigi’s Mansion 2 HD and even more! It’s a packed show, y’all. 

Watch the Video Version:

[embedded content]

Follow us on social media: Alex Van Aken (@itsVanAken), Kyle Hilliard (@KyleMHilliard), Marcus Stewart (@MarcusStewart7), Wesley LeBlanc (@LeBlancWes)

The Game Informer Show is a weekly gaming podcast covering the latest video game news, industry topics, exclusive reveals, and reviews. Join us every Thursday to chat about your favorite games – past and present – with Game Informer staff, developers, and special guests from around the industry. Listen on Apple PodcastsSpotify, or your favorite podcast app.

The Game Informer Show – Podcast Timestamps:

00:00:00 – Intro

00:02:42 – Cover Story: Dragon Age: The Veilguard

00:21:48 – Elden Ring Shadow of the Erdtree Review

00:42:20 – Concord Preview

00:59:04 – Supervive Preview

01:11:59 – The Plucky Squire

01:24:37 – Magic: The Gathering – Assassin’s Creed

01:35:01 – Still Wakes the Deep

01:45:52 – Dustborn Preview

01:55:06 – Luigi’s Mansion 2 HD Review

01:58:26 – Housekeeping

Redefining the CFO: Navigating the AI Revolution in Finance

A 2024 survey by Gartner indicates a striking trend: 71 percent of CFOs plan to increase their investments in AI by 10 percent or more compared to 2023. The rapid advancement of Artificial Intelligence (AI) is ushering in a new era for CFOs, presenting them with…

From Prompt Engineering to Few-Shot Learning: Enhancing AI Model Responses

Artificial Intelligence (AI) has witnessed rapid advancements over the past few years, particularly in Natural Language Processing (NLP). From chatbots that simulate human conversation to sophisticated models that can draft essays and compose poetry, AI’s capabilities have grown immensely. These advancements have been driven by significant…