This post is also available in: heעברית (Hebrew)

A new report by Computer scientists from the National Institute of Standards and Technology presents new kinds of cyberattacks that can “poison” AI systems.

AI systems are being integrated into more and more aspects of our lives, from driving vehicles to helping doctors diagnose illnesses to interacting with customers as online chatbots. To perform these tasks the models are trained on vast amounts of data, which in turn helps the AI predict how to respond in a given situation.

One major issue highlighted by the report is the possible corruption of that data—both during an AI system’s training period and afterward, while the AI continues to refine its behaviors by interacting with the physical world. This data corruption can make the AI malfunction or straight up not work.

According to Techxplore, the report presents four major types of attacks, and then classifies them according to criteria like the attacker’s goals and objectives, capabilities, and knowledge.

  • Evasion attacks occur after an AI system is deployed, and attempt to alter an input to change how the system responds to it (for example adding markings to stop signs to make an autonomous vehicle misinterpret them as speed limit signs).
  • Poisoning attacks occur in the training phase by introducing corrupted data (for example inserting many instances of inappropriate language into conversation records so that a chatbot thinks they are common enough to use in its own customer interactions).
  • Privacy attacks occur during deployment and are attempts to learn sensitive information about the AI or the data it was trained on in order to misuse it. Malicious actors can ask a chatbot legitimate questions, and then use the answers to reverse engineer and find the model’s sources. Adding undesired examples to those online sources could make the AI behave inappropriately. Additionally, making the AI unlearn those specific undesired examples after the fact can be very difficult.
  • Abuse attacks are the insertion of incorrect information into a source (like a webpage or online document) that an AI then absorbs. Unlike poisoning attacks, abuse attacks attempt to give the AI incorrect pieces of information from a legitimate but compromised source to repurpose the AI system’s intended use.

Co-author Alina Oprea, a professor at Northeastern University, further explains that most of the mentioned attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities. “Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set,” she adds.

Despite breaking down each attack class and providing mitigation approaches, the research acknowledges that the defenses AI experts have devised for adversarial attacks thus far are incomplete at best. Nevertheless, having awareness of these limitations is important for developers and organizations looking to deploy and use AI technology.