New Development to Revolutionize Video Analytics

Sep 3, 2018

This post is also available in: עברית (Hebrew)

Image analysis technology will need to become better at understanding human intentions if it is to be employed in a wide range of applications. Computers find it harder to work out people’s next likely actions based on their current behavior. A team of researchers and colleagues has developed a detector that can successfully pick out where human actions will occur in videos, in almost real-time.

Leading researcher Hongyuan Zhu, a computer scientist at A*STAR’s Institute for Infocomm Research in Singapore, said driverless cars must be able to detect police officers and interpret their actions (such as raising a hand in traffic for stop) quickly and accurately, for safe driving. Autonomous systems could also be trained to identify suspicious activities such as fighting, theft, or dropping dangerous items, and alert security officers.

Computers can accurately detect objects in static images thanks to deep learning techniques, which use artificial neural networks to process complex image information. But videos with moving objects are more challenging. “Understanding human actions in videos is a necessary step to build smarter and friendlier machines,” says Zhu.

Previous methods for locating potential human actions in videos did not use deep-learning frameworks and were slow and prone to error.

To overcome this, the team’s “YoTube” detector combines two types of neural networks in parallel: a static neural network, which has already proven to be accurate at processing still images, and a recurring neural network, typically used for processing changing data, for speech recognition. “Our method is the first to bring detection and tracking together in one deep learning pipeline,” says Zhu.

The team tested its new detector on more than 3,000 videos routinely used in computer vision experiments. They report that it outperformed state-of-the-art detectors at correctly picking out potential human actions by approximately 20 percent for videos showing general everyday activities and around 6 percent for sports videos, according to A*STAR institute website.

The detector occasionally makes mistakes if the people in the video are small, or if there are many people in the background. Nonetheless, Zhu says, “We’ve demonstrated that we can detect most potential human action regions in an almost real-time manner.”

New Development to Revolutionize Video Analytics

Latest

Metal That Behaves Like a Gel Could Redefine High-Temperature Systems

Gmail Data Leak: 183 Million Reasons to Rethink Your Password Security

3D-Printed Antennas Bend Without Breaking the Signal

New Method Enhances AI’s Ability to Recognize Personalized Objects

AI-Powered Eye Chip: A New Chapter for the Blind

New Tech Solves Key Weakness in Solid-State Batteries

99% Accuracy: How Never Mine is Shaping the Future of Demining

Stronger Magnets, Smaller Motors: A Boost for Clean Energy Tech

Batteries, Not Flux Capacitors: The Real Future of Urban Flight

Magnetic Origami Bots Take a Step Toward Smart Medicine

Smart, Scalable, Mobile: The Next-Gen Turret System

Tracking Mosquitoes and Floods from Space

Microscopic DNA Petals Mimic Nature to Perform Medical Tasks

The Engine That Breaks the Thermodynamic Rulebook

Smartwatch Breakthrough Enables Centimeter-Level GPS Accuracy

When AI Becomes Your Space Medic

Can’t be Cloned: Hydrogel Gives Products a Unique ID

Bridging the Global Talent Gap with AI: How Iverse Is Redefining...

Print Smarter, Not Harder: Open-Source Multi-Material 3D Printing

Clear Tech, Clear Skin: UV Safety Goes Wearable