Making AI Systems Less Socially Biased

image provided by pixabay

This post is also available in: עברית (Hebrew)

Doctoral students from Oregon State University and Adobe researchers collaborated to create a new and cost-effective training technique for AI systems to make them less socially biased.

They call this new method FairDeDup, an abbreviation for fair deduplication – Deduplication is the term for removing redundant information from the data used to train AI systems, thus lowering the high computing costs of the training.

AI systems are usually trained on datasets taken from the internet that often contain biases present in society, so when those biases are codified in trained AI models, they can perpetuate unfair ideas and behavior, as explained by the researchers.

Understanding how deduplication affects bias prevalence makes it possible to manage negative effects. Eric Slyman of the OSU College of Engineering explains that while the team’s prior work has shown that removing redundant data can enable accurate AI training with fewer resources, they found the process can also exacerbate the harmful social biases AI often learns.

According to Techxplore, FairDeDup works by thinning the datasets of image captions collected from the web through a process known as pruning, which refers to choosing a subset of the data that’s representative of the whole dataset. If done in a content-aware manner, pruning allows to make informed decisions about which parts of the data stay and which go.

“FairDeDup removes redundant data while incorporating controllable, human-defined dimensions of diversity to mitigate biases… Our approach enables AI training that is not only cost-effective and accurate but also more fair,” said Slyman.

The researchers aimed to lessen biases regarding occupation, race, gender, age, geography, culture, and more. Addressing these biases during dataset pruning can create more “socially just” AI systems.

“Our work doesn’t force AI into following our own prescribed notion of fairness but rather creates a pathway to nudge AI to act fairly when contextualized within some settings and user bases in which it’s deployed. We let people define what is fair in their setting instead of the internet or other large-scale datasets deciding that,” concluded Slyman.