
Google introduced its successor to its Paligemma AI (AI) vision model on Thursday. The family of AI models, called Paligemma 2, improves the capabilities of the older generation. The mountain view-based tech giant says visual language models can see, understand and interact with visual inputs such as images and other visual assets. It is built using the Gemma 2 Small Language Model (SLM) released in August. Interestingly, the tech giant claims that the model can analyze emotions in uploaded images.
Google Paligemma AI Model
In a blog post, the tech giant details the new Paligemma 2 AI model. Although Google has multiple visual language models, Paligemma is the first such model in the Gemma family. Visual models differ from typical large language models (LLMs) in that they have other encoders that can analyze visual content and convert it into familiar form of data. In this way, the visual model can technically “see” and understand the outside world.
One benefit of a smaller visual model is that it can be used in a large number of applications, as smaller models have been optimized with speed and accuracy. With Paligemma 2 being open source, developers can use its capabilities to build applications.
The three different parameters of Paligemma 2 are 3 billion, 10 billion and 28 billion. It is also available in 224p, 448p, 896p resolutions. Therefore, the tech giant claims that it can easily optimize the performance of AI models for a variety of tasks. Google says it generates detailed, context-sensitive subtitles for the images. It not only recognizes objects, but also describes the action, emotion and overall narrative of the scene.
Google emphasizes that the tool can be used for chemical formula recognition, music score recognition, spatial reasoning and chest X-ray report generation. The company also published papers in the online preprint magazine Arxiv.
Developers and AI enthusiasts can download code about embracing faces and Kaggle here and here. AI models support frameworks such as Hug Face Transformer, Keras, Pytorch, Jax, and Gemma.cpp.