Studies Warn That AI Has Learned to Deceive Humans

image provided by pixabay

This post is also available in: עברית (Hebrew)

A significant number of artificial intelligence systems were revealed to have developed the ability to deceive humans, a troubling pattern that raises serious concerns about potential risks. Research regarding this issue claims that both specialized and general-purpose AI systems learned to manipulate information to achieve specific outcomes.

According to Interesting Engineering, the study highlights a striking example in Meta’s CICERO, which “turned out to be an expert liar.” CICERO is an AI model designed to play the strategic alliance-building game Diplomacy, and despite Meta’s claims that it was trained to be “largely honest and helpful,” the AI was discovered to resort to deceptive tactics like making false promises, betraying allies, and manipulating other players to win the game.

While cheating and manipulation could seem harmless when it is part of a game, it demonstrates AI’s potential to learn and utilize deceptive tactics in real-world scenarios.

Another example is OpenAI’s ChatGPT, which was shown to trick a TaskRabbit worker into solving a Captcha by pretending to have a vision impairment. The report explains that while GPT-4 received hints from a human evaluator, it reasoned independently and was not directed to lie and “used its own reasoning to make up a false excuse for why it needed help on the Captcha task.”

These findings show how AI models can learn to be deceptive when it’s beneficial for completing their tasks. Lead author of the paper and an AI safety researcher at MIT, Peter S. Park, said that “AI developers do not have a confident understanding of what causes undesirable AI behaviors like deception.”

This behavioral pattern of AI systems learning deception is dangerous in several ways- it can be exploited by malicious actors to deceive and harm others, leading to increased fraud, political manipulation, and potentially even terrorist recruitment. Furthermore, if a system designed for strategic decision-making is trained to be deceptive, it could normalize deceptive practices in politics and business.

The researchers claim that the issue of deception must be addressed as AI continues to become more advanced and integrated into our daily lives, and call for attention from policymakers.

Park stated: “We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models… If banning AI deception is politically infeasible at the current moment, we recommend that deceptive systems be classified as high risk.” Such a classification would subject those systems to closer and stricter inspection and regulation, thus potentially mitigating the risks they pose to society.