Algorithm Requires Access to Face Datasets

Algorithm Requires Access to Face Datasets

This post is also available in: heעברית (Hebrew)

For computer vision and facial recognition systems to work reliably, they need training datasets that approximate real-world conditions. Advances in computer vision approaches facilitate tracking and re-identification of persons in security camera networks. However, the primary datasets available for algorithm training and performance evaluations are somewhat limited, as so far, researchers have had access to only a small number of image datasets, many of which are heavily populated with still pictures of fair-skinned men. This limitation impacts the accuracy of the technology, resulting in a disconnect between the data being leveraged by researchers and the types of video data that would exist in actual video networks utilized by public safety and law enforcement entities.

Moreover, cameras’ scope and angle, as well as the lighting or weather during a given recording, often make it difficult for law enforcement to track or re-identify people from security camera footage as they try to reconstruct crimes, protect critical infrastructure and secure special events, transportation facilities, military forces, etc.

IARPA, the US Intelligence Advanced Research Projects Activity, has launched an effort to collect research data from multi-camera video networks in support of computer vision research. According to, the agency is interested in an annotated video collection of 960 hours that includes data collected over multiple days with varying illumination from a network of at least 20 cameras with varying positions, views, resolutions and frame rates that include both overlapping and non-overlapping fields of view.

The data should be one that was captured in urban and semi-urban environments with multiple intersections, buildings entrances/exits and pedestrian foot traffic as well as signs, vehicles, trees and other obstructions.

The data should involve a minimum of 5,000 pedestrians and at least 200 subject volunteers given instructions on how to behave and/or where to go in the camera network.

The dataset must be approved for human-subject research and made available to the general research community under a privacy, legal and policy approved data release process.

According to IARPA website, carrying out the data collection may require partnering with a state, local, or municipal government agency or an outside third party organization to utilize cameras or collection spaces under their jurisdiction.