The Massachusetts Institute of Technology (MIT) last week unveiled a new method to train robots that uses generative artificial intelligence (AI) models. The new technology relies on combining data across different domains and modalities and integrating them into a shared language that can then be processed by large language models (LLMs). MIT researchers claim this method could lead to general-purpose robots that can handle a wide range of tasks without the need to train each skill individually.
MIT researchers develop AI-inspired technique to train robots
In a Newsroom post, MIT explained in detail the new method for training robots. Currently, teaching a robot a certain task is a difficult proposition as large amounts of simulation and real-world data are required. This is necessary because if the robot does not understand how to function in a given environment, it will have difficulty adapting to it.
This means that for each new task, a new set of data is required for each simulation and real-world scenario. The robot then goes through a training period where actions are optimized and errors and glitches are removed. As a result, robots are generally trained on a specific task, and the multi-purpose robots seen in science fiction films have not been seen in reality.
However, a new technique developed by researchers at MIT claims to overcome this challenge. In a paper published in the pre-print online journal arXIv (note: this is not peer-reviewed), scientists highlighted how generative AI could help tackle this problem.
To this end, data from different domains, such as simulations and real robots, and different modalities, such as vision sensors and robotic arm position encoders, were integrated into a shared language that can be processed by AI models. A new architecture called Heterogeneous Pretrained Transformers (HPT) was also developed to integrate the data.
Interestingly, the study’s lead author, Lirui Wang, a graduate student in electrical engineering and computer science (EECS), said that the technology was inspired by AI models such as OpenAI’s GPT-4.
The researchers added an LLM model called a transformer (similar to the GPT architecture) to the middle of their system and it processes both vision and proprioception (sense of self-movement, force, and position) inputs.
MIT researchers say this new method could be faster and less expensive to train robots than traditional methods. This is mainly due to the small amount of task-specific data required to train robots in different tasks. Additionally, the study found that the method outperformed training by more than 20 percent in both simulations and real-world experiments.