NVIDIA researchers on Monday launched a new artificial intelligence (AI) model that can reposition objects in images. The tool, called Difference, allows spatial understanding of the context of an image to move an object from one position to another without affecting the background or shape of the image. The unique aspect of the technique is that it is trained without training, which means that this tool is not built using pre-trained data. The new technology was demonstrated by the company at the Asia 2024 Conference, the Special Interest Group on Computer Graphics and Interactive Technologies.
In the research paper, NVIDIA researchers detailed the new AI tools. The technology was developed in collaboration with Hebrew University, Tel Aviv University and Richmann University. With the help of new tools, researchers aim to solve a prominent problem through AI image generation models, namely the problem of relocating objects with spatial awareness.
The paper emphasizes that this particular editing task remains a bottleneck for AI scientists due to the lack of AI models for spatial reasoning. Existing visual models can understand the context of the image, but because the object does not understand how to sense motion in a 2D environment in space, it is impossible to move the object.
With the difference, NVIDIA claims to fix this problem. Based on the image diffusion architecture, the tool uses attention masking in the DeNoising step. This is done to preserve the high-level object appearance. AI tools use Blobgen, a new technology that integrates spatial understanding into AI tools. In addition, real images with local models were reconstructed at designated locations using new techniques.
On the front end, users will be able to type text prompts highlighting the object they want to change, and the AI can space re-tune the object when the background is adjusted accordingly. In the demo displayed by the company, it is not possible to determine whether the AI editing tool can understand the shape changes caused by spatial motion. For example, if an airborne balloon is moved to the ground, its shape will also change. However, due to lack of training, AI may not be able to capture it.
