Microsoft announces magma basic model that can complete multimodal proxy tasks

Microsoft researchers announced a new fundamental model on Wednesday that can perform proxy functions. Artificial intelligence (AI) models, called magma, are pre-trained on large datasets in text, images, videos, and spatial formats. The Redmond-based tech giant says magma is an extension of the visual language (VL) model that not only understands multimodal information, but also plans and acts on it. Models that support AI agents can be used for a variety of tasks, including computer vision, user interface (UI) navigation, and robotic manipulation.

Microsoft announces basic model of magma

In a GitHub post, Microsoft researchers detailed the new Magma Foundation model. The base models are unique large language models (LLMS) that are built from scratch and are not extracted from any other models. They usually serve as the baseline for other models in the series. Magma is unique, and in a sense, AI models have been pre-trained on a wide range of datasets.

The researchers pointed out that the basic building behind the magma is the Llama 3 AI model. However, magma also has the ability to plan and act in the visual space world. This allows the model to not only generate output such as a chatbot, but also perform operations.

It can be used as a computer vision chatbot that can provide information about the world paired with camera sensors. Magma can also be used to control the UI of the device. But even more interesting is that it can also control the robot to use proxy functions to complete complex tasks.

One of the main reasons behind these features is the diverse data sets and two technical components, including markers and markers, the researchers say. The former enables grounding of action in images, videos, and spatial data by making the model predict digital markings of buttons or robot arms in image space. The latter powers the model’s temporal video dynamics and predicts the next frame before taking action. This allows the model to develop a strong spatial understanding.

Microsoft researchers also shared benchmark scores of AI models based on internal tests. It achieved competitive results in all proxy evaluation tests, surpassing OpenAI, Alibaba and Google’s models. As of now, the company has not released magma in the public sector.