
Google DeepMind, the company’s AI research unit, unveiled the Astra project for the first time in I/O this year. Now, more than six months later, the tech giant has announced new capabilities and improvements for artificial intelligence (AI) agents. With the Gemini 2.0 AI model, it now allows conversation in multiple languages, access multiple Google platforms and improves memory. The tool is still in beta, but the mountain view-based tech giant says it is working to bring the Astra project to the Gemini App, Gemini AI Assistant, and even form factors like glasses.
Google adds new features to Astra project
Project Astra is a universal AI proxy that features similar features to OpenAI’s visual mode or Meta Ray-Ban smart glasses. It can be integrated with camera hardware to view the user’s surroundings and process visual data to answer questions about them. In addition, the AI agent has limited memory, and can remember visual information even if it is not actively displayed through the camera.
Google DeepMind highlighted in a blog post that the team has been working to improve AI agents since its May showcase. Now, with Gemini 2.0, Astra Project has been upgraded several times. Now you can talk in multiple and mixed languages. The company said it now has a better understanding of accents and rare words.
The company also introduced tool usage in Project Astra. Now it can leverage Google searches, lenses, maps and Gemini to answer complex questions. For example, a user can display a landmark and ask an AI agent to show direction to their house and can identify the object and verbally guide the user’s home.
The memory function of the AI proxy has also been upgraded. Back in May, the Astra project only retained visual information from the past 45 seconds and has now extended it to 10 minutes of in-class memory. Additionally, it can remember more past conversations to provide more personalized responses. Finally, Google claims that the agent can now understand language during the incubation period of human conversations, making interactions with tools more like humans.