This post is also available in: heעברית (Hebrew)

Vision and understanding images is no longer the purview of biological creatures only. In recent years, computers have become astonishingly good at understanding what they’re seeing. Microsoft and Google have both showcased systems last year that are much better at recognising objects in images than humans, the MIT Technology Review reports.

Behind these successes lies “a technique called deep learning, which involves passing data through networks of roughly simulated neurons to train them to filter future data.” In deep learning, the neural networks are arranged in hierarchical layers. Data passes through them in a particular order, and each layers “learns,” or becomes specialised, for a particular type of processing, to identify particular visual features.

Convolutional nets, the neural networks used for visual processing were inspired by studies on, and are in effect attempts to replicate, the neural structures in the visual cortex of animals.

“These networks are a huge leap over traditional computer vision methods, since they learn directly from the data they are fed,” says Matthew Zeiler, CEO of Clarifai, a provider of image recognition services.

In the past, programmers had to invent the algorithms that would seek out visual features in photos. The results were just not very good.

Zeiler’s software, based on methods he developed as a grad student at New York University, worked through million of images to train the network. The software is tasked with identifying distinct objects, from mugs to cars to buildings. Different features in images activate different layers of the neural networks, which in turn reinforces the network’s role in identifying this particular feature. In effect, millions of passes through the image stack teach the software to do the passes better.