
Google introduced the successor to the Gemini 1.5 family AI model on Wednesday, known as Gemini 2.0. The company stresses that the new AI models have improved features, including local support for image generation and audio generation. Currently, the GEMINI 2.0 model in Beta can be used to select developers and testers, while the Gemini 2.0 Flash AI model has been added to the chatbot’s web and mobile applications, added for all users. Google says larger models will soon be pushed to their products.
Google Gemini 2.0 AI Model
Nine months after Google released the Gemini 1.5 series AI model, Google now introduces an upgraded version of the Large Language Model (LLM). The company announced in a blog post that it will release the first model of the Gemini 2.0 Family, an experimental version of the Gemini 2.0 Flash. Flash models usually contain fewer parameters and are not suitable for complex tasks. However, it compensates for it with a lower latency and higher efficiency than larger models.
Mountain View-based tech giant emphasizes that Gemini 2.0 Flash now supports multi-mode output, such as images with text and reconciled text-to-speech (TTS) multilingual audio. In addition, the AI model is also equipped with proxy function. 2.0 flashing local calling tools such as Google search, code execution-related tools, and third-party features, once the user defines them through the API.
For performance, Google shares the benchmark score for Gemini 2.0 Flash based on internal testing. Regarding the massively multitasking language understanding (MMLU), natural 2 code, math and graduate-level Google-Profforn-Profforn-Prover Q&A (GPQA) benchmark, it even outperforms the Gemini 1.5 Pro model.
Gemini users can select experimental models from the model selector options located in the upper left corner of the web and from the top of the mobile application interface. In addition, the AI model can also be obtained through the Gemini Application Programming Interface (API) in Google AI Studio and Vertex AI. This model will be suitable for developers with multi-modal input and text output. Image and text-to-speech features are currently only available to Google’s early access partners.