Beating AI Bias Through Diversifying its Data

Beating AI Bias Through Diversifying its Data

image provided by pixabay

This post is also available in: heעברית (Hebrew)

AI is revolutionizing many industries, including healthcare. However, it is only as good as its data, and biased data can lead to serious outcomes, including unnecessary surgery and even missing treatable cancers. For example, an AI used by a dermatologist may not have enough dark-skinned examples and miss a crucial skin cancer diagnosis.

A new paper titled “Quality-Diversity Generative Sampling for Learning with Synthetic Data” by computer science researchers from the University of Southern California proposes a novel approach to mitigate bias in ML model training, especially specifically in image generation. According to Techxplore, the researchers used a family of algorithms called “quality-diversity algorithms” (or QD algorithms) to create diverse synthetic datasets that can strategically “plug the gaps” in real-world training data.

Lead author of the paper Allen Chang said: “I think it is our responsibility as computer scientists to better protect all communities, including minority or less frequent groups, in the systems we design. We hope that quality-diversity optimization can help to generate fair synthetic data for broad impacts in medical applications and other types of AI systems.”

AI has been used in the past to create synthetic data, but it can be problematic since it introduces a danger of producing biased data, which can further bias downstream models, creating a vicious cycle. However, quality diversity (QD) algorithms can be used to generate diverse solutions to a problem, and in this case they were used to solve the problem of creating diverse synthetic datasets.

This way the researchers managed to generate a diverse dataset of around 50,000 images in 17 hours, which was then successfully tested on up to four measures of diversity—skin tone, gender presentation, age, and hair length.

Chang explains: “This is a promising direction for augmenting models with bias-aware sampling, which we hope can help AI systems perform accurately for all users.”

This method especially increases the representation of intersectional groups (groups with multiple identities) in the data, like dark-skinned people wearing glasses.

This discovery shows for the first time that generative models can use quality diversity to repair biased classifiers. This work is a first step in the direction of enabling biased models to ‘self-repair” by iteratively generating and retraining on synthetic data.