This post is also available in:
עברית (Hebrew)
Cities generate enormous amounts of traffic video every day, yet only a fraction of it is ever reviewed. For transportation agencies, the challenge isn’t a lack of data—it’s the inability to process it fast enough to identify problems before they lead to serious crashes. A new AI-driven system developed at NYU Tandon aims to close that gap by automatically detecting collisions and near-misses in existing footage, offering a scalable way to improve road safety without installing new sensors or expanding staff.
According to TechXplore, the system, called SeeUnsafe, combines image understanding and natural-language reasoning to analyze long-form traffic video. Instead of relying on manually labeled datasets or custom training for each location, the model uses large, pre-trained multimodal AI systems that can interpret both visual scenes and textual descriptions. This allows it to classify events—such as a vehicle narrowly missing a pedestrian or two cars colliding—using context rather than simple motion cues.
In testing, SeeUnsafe correctly distinguished collisions, near-misses, and normal traffic roughly 77% of the time, and was more accurate at identifying which road users were involved, reaching up to 87.5% accuracy. These insights enable agencies to pinpoint dangerous intersections, problematic road layouts, or recurring risky behaviors before serious incidents occur. For cities with thousands of cameras, automating this analysis can transform how safety interventions are planned and deployed.
Beyond civilian traffic management, similar systems could support security and defense operations, where real-time video feeds must be monitored for anomalous behavior across large areas. Automated interpretation of footage could assist in perimeter protection, convoy monitoring, or rapid threat detection—tasks that currently depend on human operators who can be quickly overwhelmed by volume.
SeeUnsafe also generates natural-language reports explaining its decisions, noting relevant conditions like weather, lighting, traffic density, and the specific movements that contributed to an event. While the system still struggles in low-light environments and depends on consistent object tracking, it demonstrates how AI can begin to “understand” the dynamics of safety-critical situations.
Looking ahead, researchers expect the approach to extend to in-vehicle cameras, enabling proactive risk detection from the driver’s perspective. As urban systems grow more complex, tools like SeeUnsafe could give both civil agencies and security organizations a way to turn massive video archives into actionable safety intelligence.
The research was published here.
























