Google on Wednesday introduced the successor to the Gemini 1.5 family of AI models, called Gemini 2.0. The company highlighted that the new AI models come with improved capabilities, including native support for image generation and audio generation. Currently, the Gemini 2.0 model is available in beta for select developers and testers, while the Gemini 2.0 Flash AI model has been added to the chatbot’s web and mobile apps for all users. Google said that larger models will also be included in its products soon.
Google Gemini 2.0 AI Model
Nine months after the release of the Gemini 1.5 series of AI models, Google has now introduced an improved version of Large Language Models (LLM). In a blog post, the company announced that it is releasing the first model in the Gemini 2.0 family – an experimental version of the Gemini 2.0 Flash. Flash models generally have fewer parameters and are not suitable for complex tasks. However, it compensates for this with lower latency and higher efficiency than larger models.
The Mountain View-based tech giant highlighted that Gemini 2.0 Flash now supports multimodal output such as image generation with text and steerable text-to-speech (TTS) multilingual audio. Additionally, the AI model is also equipped with agentic functions. 2.0 Flash natively calls tools such as Google Search, code execution-related tools, as well as third-party functions when a user defines them through the API.
Talking about performance, Google shared benchmark scores of Gemini 2.0 Flash based on internal testing. On the Massive Multitask Language Understanding (MMLU), Natural2Code, MATH, and graduate-level Google-Proof Q&A (GPQA) benchmarks, it even outperforms the Gemini 1.5 Pro model.
Gemini users can select experimental models from the model selector option located at the top left of the web and top of the mobile app interface. Additionally, the AI model is also available in Google AI Studio and Vertex AI through the Gemini Application Programming Interface (API). This model will be available to developers with multimodal input and text output. Image and text-to-speech capabilities are currently only available to Google’s early access partners.