Microsoft Muse AI model unveiled; it can generate game visuals and controller actions

Microsoft researchers launched a new artificial intelligence (AI) model on Wednesday that can generate 3D gaming environments. The new AI model, known as the World and Human Action Model (WHAM) or MUSE, was developed by the tech giant’s team of researching gaming intelligence and teaching AI experience (Tai X) with Xbox Games Studios’ Ninja Theory. The company said that Big Language Models (LLM) can help game designers during the conception process and help generate game visuals and controller actions to help creative sources in game development.

Microsoft unveils Muse AI model

In a blog post, the Redmond-based tech giant details the Muse AI model. This is the current research product, although the company says it is open source for the model of the WHAM demo (the concept of the visual interface interacts with the AI model). Developers can try models on Azure AI Foundry. A paper detailing the technology of the model is published in the Journal of Nature.

Training models on such complex areas is a difficult claim. Microsoft researchers have collected a large amount of human game data from the 2020 Game Bleeding Edge, a game published by Ninja Theory. LLM received a billion image action pair training, equivalent to seven years of human gaming. The data is said to have been collected ethically and is only used for research purposes.

Expanding model training is a major challenge, the researchers say. Initially, Muse was trained in a group with the NVIDIA V100 GPU, but then scaled it to multiple NVIDIA H100 GPUs.

Enter the function, the Muse AI model accepts text prompts and visual input. Additionally, once the game environment is generated, it can be further enhanced with controller operations. AI responds to actions made by users to render new environments consistent with the initial prompt and is consistent with other gameplay.

Because it is a unique AI model, typical benchmarks cannot correctly evaluate their functionality. The researchers stressed that they have tested metrics such as consistency, diversity and durability of LLM internally. Since it is a research-focused model, the output is limited to 300x180p resolution.

Membership links can be automatically generated – see our Ethics Statement for details.