Scientists at the Massachusetts Institute of Technology (MIT) have developed a robotic dog capable of playing soccer by combining artificial intelligence (AI) and computer vision.
Robotic dog playing soccer. (Video: Live Science).
Researchers at MIT have developed a method called “Clio”, which enables robots to quickly map an environment using a camera mounted on their body and identify the most relevant parts of the task assigned to them through voice commands. The study was published in the IEEE Robotics and Automation Letters on October 10.
Clio leverages the theory of “information bottlenecks,” where information is compressed so that a neural network—a set of layered algorithms designed to mimic how the human brain processes information—selectively chooses and stores relevant segments. Any robot equipped with this system will process instructions selectively, focusing on its task while disregarding everything else.
For example, if there is a stack of books in the environment and the task is to retrieve the green book, all information about the scene is filtered, resulting in a cluster of segments representing the green book, said Dominic Maggio, a co-author of the study and a graduate student at MIT. “All other unrelated segments are grouped into a cluster that can be easily discarded.“
To demonstrate Clio’s functionality, the research team used the four-legged Spot robot from Boston Dynamics running Clio to explore an office building and perform a series of tasks. Operating in real-time, Clio created a virtual map showing only the objects related to its task, allowing the Spot robot to achieve its goal.
Robotic dog. (Photo: Andy Ryan).
The robot can also see, understand, and follow commands. The researchers achieved this level of detail with Clio by integrating large language models (LLMs)— numerous virtual neural networks that serve as the backbone for AI tools, systems, and services—trained to identify all types of objects, with computer vision. The breakthrough that Clio offers is its detailed understanding of what it sees in real-time, related to the specific tasks it is assigned.
A core part of this is integrating a mapping tool into Clio, allowing it to break down a scene into many smaller segments. Then, a neural network selects segments that are semantically similar—meaning they serve the same purpose or form similar objects.
In the future, the research team plans to refine Clio to handle higher-level tasks. “We are still assigning Clio specific tasks, such as ‘find the deck of cards,'” Maggio said. “For search and rescue, you need to assign it higher-level tasks, like ‘find survivors’ or ‘restore power.'” Therefore, we aim to achieve a more human-like understanding of how to complete more complex tasks.