This post is also available in: heעברית (Hebrew)

AI researchers are raising the need to be able to teach the technology to forget.

Scientists in this emerging field of research are highlighting the important weapon for mitigating the risks of AI – “machine unlearning,” and finding new ways to make deep neural networks (DNNs) forget data that “poses a risk to society.”

According to Techxplore, re-training AI programs to “forget” data is a very expensive and difficult task, since modern DNNs based on large language models (like ChatGPT or Bard) are trained using massive resources for long weeks or even months. Furthermore, they require tens of Gigawatt-hours of energy for every training program, with some research using enough energy to power thousands of households for one year.

“Machine Unlearning” is a growing field of research that could potentially remove problematic data from DNNs quickly and cheaply while using fewer resources. The goal is to do so while continuing to ensure high accuracy.

Professor Peter Triantafillou from the Department of Computer Science at the University of Warwick recently claimed that given how complex DNNs are and the datasets they are trained on, DNNs may be harmful to society.

He said that DNNs may be harmful by being trained on data with biases and propagating negative stereotypes or containing data with ‘erroneous annotations’ (like the incorrect labeling of items).

Another problem is that according to Triantafillou, DNNs can be trained on data that violates individuals’ privacy, posing a huge challenge to mega-tech companies that have legislation aimed to safeguard the right to be forgotten (the right of any individual to request that their data be deleted from any dataset and AI program).

“Machine unlearning” is a new field of research that seems to be becoming an important tool for mitigating the risks of AI and information security. Triantafillou explains that the research has derived the new ‘machine unlearning’ algorithm that ensures DNNs can forget dodgy data, without compromising overall AI performance.

“The algorithm can be introduced to the DNN, causing it to specifically forget the data we need it to, without having to re-train it entirely from scratch again. It’s the only work that differentiated the needs, requirements, and metrics for success among the three different types of data needed to be forgotten: biases, erroneous annotations, and issues of privacy.”