in

Gemini 2.0 to Release Real-Time Audio, Video-Streaming Input Tools in New Update

Gemini 2.0 to Release Real-Time Audio, Video-Streaming Input Tools in New Update

Gemini 2.0 has launched an experimental version of Gemini 2.0 Flash its workhorse model with low latency and enhanced performance as the global AI wars rage on.

Gemini 2.0 Flash

Gemini 2.0 Flash builds on the success of 1.5 Flash but Gemini 2.0 Flash outperforms 1.5 Pro on key benchmarks, at twice the speed.

Apart from supporting multimodal inputs like images, video and audio, 2.0 Flash now supports multimodal output like natively generated images mixed with text and steerable text-to-speech (TTS) multilingual audio. It can also natively call tools like Google Search, code execution as well as third-party user-defined functions.

Gemini 2.0 Flash is available now as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal input and text output available to all developers, and text-to-speech and native image generation available to early-access partners. General availability will follow in January, along with more model sizes. 

Google is also releasing a new Multimodal Live API that has real-time audio, video-streaming input and the ability to use multiple, combined tools.

Gemini 2.0 available in Gemini app, our AI assistant

Gemini users globally can access a chat optimized version of 2.0 Flash experimental by selecting it in the model drop-down on desktop and mobile web and it will be available in the Gemini mobile app soon.  With this new model, users can experience an even more helpful Gemini assistant. 

Unlocking agentic experiences with Gemini 2.0 

Gemini 2.0 Flash’s native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences. 

The practical application of AI agents is a research area full of exciting possibilities. Project Astra, Gemini’s research prototype exploring future capabilities of a universal AI assistant; the new Project Mariner, which explores the future of human-agent interaction, starting with your browser; and Jules, an AI-powered code agent that can help developers. 

Project Astra: agents using multimodal understanding in the real world

Project Astra‘s latest version now has the ability to converse in multiple languages and in mixed languages, with a better understanding of accents and uncommon words. Users can use Google Search, Lens and Maps and it can remember things while keeping you in control. It now has up to 10 minutes of in-session memory. It also has new streaming capabilities and native audio understanding, the agent can understand language at about the latency of human conversation. 

Project Mariner: agents that can help you accomplish complex tasks 

Project Mariner is an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser. As a research prototype, it’s able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you. 

Project Mariner shows that it’s becoming technically possible to navigate within a browser, even though it’s not always accurate and slow to complete tasks today, which will improve rapidly over time.

Project Mariner can only type, scroll or click in the active tab on your browser and it asks users for final confirmation before taking certain sensitive actions, like purchasing something.     

Trusted testers are starting to test Project Mariner using an experimental Chrome extension now.

Jules: agents for developers

Next, Google is exploring how AI agents can assist developers with Jules — an experimental AI-powered code agent that integrates directly into a GitHub workflow. It can tackle an issue, develop a plan and execute it, all under a developer’s direction and supervision.

Agents in games and other domains

Google DeepMind has a long history of using games to help AI models become better at following rules, planning and logic. Just last week, Google introduced Genie 2, its AI model that can create an endless variety of playable 3D worlds — all from a single image. I has also built agents using Gemini 2.0 that can users to navigate the virtual world of video games. It can reason about the game based solely on the action on the screen, and offer up suggestions for what to do next in real time conversation.  

Readers 148

Report

What do you think?

Newbie

Written by Mr Viral

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

The Influence of Julius Caesar: A Global Legacy

The Influence of Julius Caesar: A Global Legacy

OPPO makes a commitment to democratize AI to all its users in 2025

OPPO makes a commitment to democratize AI to all its users in 2025