Google DeepMind, the company’s AI research wing, unveiled Project Astra for the first time at I/O this year. Now, more than six months later, the tech giant announced new capabilities and improvements to the artificial intelligence (AI) agent. Based on the Gemini 2.0 AI model, it can now converse in multiple languages, access multiple Google platforms and has improved memory. The device is still in the testing phase, but the Mountain View-based tech giant said it is working to bring the Gemini app, Gemini AI assistant, and even glasses-like form factors to Project Astra.
Google adds new capabilities to Project Astra
Project Astra is a general purpose AI agent similar in functionality to OpenAI’s Vision Mode or the Meta Ray-Ban smart glasses. It may integrate with camera hardware to process visual data to view the user’s surroundings and answer questions about them. Additionally, the AI agent comes with a limited memory that allows it to remember visual information even when it is not being actively shown through the camera.
Google DeepMind explained in a blog post that since the showcase in May, the team has been working on improving the AI agent. Now, with Gemini 2.0, Project Astra has received several upgrades. It can now communicate in multiple languages and mixed languages. The company said it now has a better understanding of pronunciation and unusual words.
The company has also started using the tool in Project Astra. Now it can use Google Search, Lens, Maps and Gemini to answer complex questions. For example, users can show a landmark and ask the AI agent to give them directions to their home, and it can recognize the object and verbally give the user directions home.
The memory function of the AI agent has also been upgraded. In May, Project Astra could only retain the last 45 seconds of visual information, now expanded to 10 minutes of in-session memory. Additionally, it can also remember past conversations to provide more personalized responses. Finally, Google claims that the agent can now understand language at the latency of a human conversation, making interactions with the tool more human.