New AI Framework Teaches Robots to Use Tools by Watching Videos

Aug 28, 2025

Image by Unsplash — Representational image

This post is also available in: עברית (Hebrew)

A research team from the University of Illinois Urbana-Champaign, along with collaborators from Columbia University and UT Austin, has introduced a new system that allows robots to learn complex tool-use skills simply by watching video clips. The approach, called “Tool-as-Interface,” marks a departure from traditional robotics, which relies heavily on manual programming or sensor-intensive training setups.

The system enables robots to observe tasks—such as hammering, scooping, or flipping food—and reproduce them using only visual input from two camera angles. The method removes the need for motion capture suits, specialized tools, or remote human control.

According to TechXplore, at the core of the framework is a visual model called MASt3R, which converts two frames from ordinary videos into a 3D reconstruction of the scene. Using a technique known as 3D Gaussian splatting, the system then generates multiple synthetic viewpoints, allowing the robot to analyze the task from different angles.

To focus the robot’s learning on the interaction between the tool and its environment, the human is digitally removed from the scene using a segmentation model known as Grounded-SAM. This tool-centric view allows the system to understand the function and motion of the tool itself, rather than mimicking the human operator. As a result, learned skills are more easily transferred between different robotic platforms with varying hardware configurations.

The research team tested the method on five distinct tasks, including hammering nails, scooping meatballs, and kicking a soccer ball. The robots performed these actions with high success rates, outperforming traditional teleoperation-based training by 71% and reducing training time by 77%.

While promising, the system does have some limitations. It currently assumes tools are fixed to the robot’s gripper and can occasionally misjudge position when reconstructing camera views. Still, the team sees this as a key step toward enabling robots to learn from widely available video content such as online tutorials or home recordings.

The research was recognized with a Best Paper Award at ICRA 2025 and is available as a preprint on arXiv.

New AI Framework Teaches Robots to Use Tools by Watching Videos

Latest

Metal That Behaves Like a Gel Could Redefine High-Temperature Systems

Gmail Data Leak: 183 Million Reasons to Rethink Your Password Security

3D-Printed Antennas Bend Without Breaking the Signal

New Method Enhances AI’s Ability to Recognize Personalized Objects

AI-Powered Eye Chip: A New Chapter for the Blind

New Tech Solves Key Weakness in Solid-State Batteries

99% Accuracy: How Never Mine is Shaping the Future of Demining

Stronger Magnets, Smaller Motors: A Boost for Clean Energy Tech

Batteries, Not Flux Capacitors: The Real Future of Urban Flight

Magnetic Origami Bots Take a Step Toward Smart Medicine

Smart, Scalable, Mobile: The Next-Gen Turret System

Tracking Mosquitoes and Floods from Space

Microscopic DNA Petals Mimic Nature to Perform Medical Tasks

The Engine That Breaks the Thermodynamic Rulebook

Smartwatch Breakthrough Enables Centimeter-Level GPS Accuracy

When AI Becomes Your Space Medic

Can’t be Cloned: Hydrogel Gives Products a Unique ID

Bridging the Global Talent Gap with AI: How Iverse Is Redefining...

Print Smarter, Not Harder: Open-Source Multi-Material 3D Printing

Clear Tech, Clear Skin: UV Safety Goes Wearable