Automated Fake News Detection

Automated Fake News Detection

image provided by pixabay

This post is also available in: heעברית (Hebrew)

Under the looming threat of fake news, misinformation, and disinformation online, many scientists have worked to develop an automated “fake news” detection system that usually depends on machine learning. However, experts advise caution when deploying them.

In new research, professor Dorit Nevo from Rensselaer Polytechnic Institute published new research exploring the mistakes made by such detection tools. They found challenges in bias and generalizability due to the models’ training and design, as well as the unpredictability of news content.

Models are trained and evaluated using a set of labels referred to as “ground truth” but the people generating the labels may themselves be uncertain whether a news item is real or fake. This can perpetuate biases. “One consumer may view content as biased that another may think is true,” said Nevo. “Similarly, one model may flag content as unreliable, and another will not. A developer may consider one model the best, but another developer may disagree. We think a clear understanding of these issues must be attained before a model may be considered trustworthy.”

According to Techxplore, the research team analyzed 140,000 news articles from a single month in 2021 and examined the issues that arose from automated content moderation, reaching three main conclusions – it matters who chooses the ground truth, operationalizing tasks for automation can perpetuate bias, and ignoring or simplifying the application context reduces research validity.

In addition, the model must be constantly reevaluated, since it may fail to perform over time and the “ground truth” may become uncertain, and so experts must explore new approaches for establishing ground truth.

Inaccurate fake news detection can have severe implications, and it is probable that a single model will never be a one size fits all solution. The researchers suggest that media literacy combined with a model’s suggestions would offer the most reliability, or having a model applied to only one news topic as opposed to trying to train it on every topic at once.

They conclude that a strong solution may be reached by combining several weak, limited solutions.