This post is also available in: heעברית (Hebrew)

Computer vision has made great strides over recent years. Machines can now recognise objects nearly as well as humans. However, the similarity does not end there. Researchers found that just like their human counterparts, computers are susceptible to optical illusions. This raises serious security concerns, while at the same time opening new avenues for computer vision research.

Jason Yosinski, a graduate student at Cornell, along with colleagues from the University of Wyoming Evolving Artificial Intelligence Laboratory, have managed to compose images that, to the human eye look like white noise, but in which computers identify objects with a high degree of certainty. They presented their findings at the IEEE Computer Vision and Pattern Recognition conference in Boston last June.

“We think our results are important for two reasons,” said Yosinski. “First, they highlight the extent to which computer vision systems based on modern supervised machine learning may be fooled, which has security implications in many areas. Second, the methods used in the paper provide an important debugging tool to discover exactly which artifacts the networks are learning.”

Computers can be “trained” to recognise objects in images by showing them photos of the objects, from which algorithms construct a rudimentary model that the computer can then use for object recognition. Great advances have been made in the field in recent years using systems called Deep Neural Networks (DNN) that simulate human-like synapses by increasing the value of a particular memory location each time it is activated. DNNs simulate several levels of neurons in order to achieve several layers of abstraction. First, the computer recognises that in a picture there is a four-legged animal. Then, that it is a dog. Finally, it concludes that it is a Labrador.

But, despite the similarity, computers recognise images differently than we do. “We realized that the neural nets did not encode knowledge necessary to produce an image of a fire truck, only the knowledge necessary to tell fire trucks apart from other classes,” Yosinski explained. A computer might call any array of square shapes a keyboard.

With this knowledge in hand, the team “mutated” images by focusing and enhancing those features that the computers were using to recognise images. If the machine recognised the object with more certainty in the new image, they would discard the previous iteration. This way, enhancing features significant for computer recognition but discarding features humans need, they were able to produce images that were not recognisable to humans, but that the computers recognised as objects.

“The research shows that it is possible to ‘fool’ a deep learning system so it learns something that is not true but that you want it to learn,” said Fred Schneider, the Samuel B. Eckert Professor of Computer Science. “This potentially has the basis for malfeasants to cause automated systems to give carefully crafted wrong answers to certain questions. Many systems on the Web are using deep learning to analyze and draw inferences from large sets of data. DNN might be used by a Web advertiser to decide what ad to show you on Facebook or by an intelligence agency to decide if a particular activity is suspicious.”

The researchers then tried to “retrain” the system by labelling illusory images as such. This produced some improvement, but not a great deal, and the retrained networks could be fooled by other images. For machines to be used for security, research into this new arena is vitally important.

“The field of image recognition has been revolutionized in the last few years,” Yosinski said. “[Machine learning researchers] now have a lot of stuff that works, but what we don’t have, what we still need, is a better understanding of what’s really going on inside these neural networks.”