December 19, 2024 10:04 AM
Credit: VentureBeat made with ChatGPT
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
In its latest push to redefine the AI landscape, Google has announced Gemini 2.0 Flash Thinking, a multimodal reasoning model capable of tackling complex problems with both speed and transparency.
In a post on the social network X, Google CEO Sundar Pichai wrote that it was: “Our most thoughtful model yet:)”
And on the developer documentation, Google explains, “Thinking Mode is capable of stronger reasoning capabilities in its responses than the base Gemini 2.0 Flash model,” which was previously Google’s latest and greatest, released only eight days ago.
The new model supports just 32,000 tokens of input (about 50-60 pages worth of text) and can produce 8,000 tokens per output response. In a side panel on Google AI Studio, the company claims it is best for “multimodal understanding, reasoning” and “coding.”
Full details of the model’s training process, architecture, licensing, and costs have yet to be released. Right now, it shows zero cost per token in the Google AI Studio.
Accessible and more transparent reasoning
Unlike competitor reasoning models o1 and o1 mini from OpenAI, Gemini 2.0 enables users to access its step-by-step reasoning through a dropdown menu, offering clearer, more transparent insight into how the model arrives at its conclusions.
By allowing users to see how decisions are made, Gemini 2.0 addresses longstanding concerns about AI functioning as a “black box,” and brings this model — licensing terms still unclear — to parity with other open-source models fielded by competitors.
My early simple tests of the model showed it correctly and speedily (within one to three seconds) answered some questions that have been notoriously tricky for other AI models, such as counting the number of Rs in the word “Strawberry.” (See screenshot above).
In another test, when comparing two decimal numbers (9.9 and 9.11), the model systematically broke the problem into smaller steps, from analyzing whole numbers to comparing decimal places.
These results are backed up by independent third-party analysis from LM Arena, which named Gemini 2.0 Flash Thinking the number one performing model across all LLM categories.
Native support for image uploads and analysis
In a further improvement over the rival OpenAI o1 family, Gemini 2.0 Flash Thinking is designed to process images from the jump.
o1 launched as a text-only model, but has since expanded to include image and file upload analysis. Both models can also only return text, at this time.
Gemini 2.0 Flash Thinking also does not currently support grounding with Google Search, or integration with other Google apps and external third-party tools, according to the developer documentation.
Gemini 2.0 Flash Thinking’s multimodal capability expands its potential use cases, enabling it to tackle scenarios that combine different types of data.
For example, in one test, the model solved a puzzle that required analyzing textual and visual elements, demonstrating its versatility in integrating and reasoning across formats.
Developers can leverage these features via Google AI Studio and Vertex AI, where the model is available for experimentation.
As the AI landscape grows increasingly competitive, Gemini 2.0 Flash Thinking could mark the beginning of a new era for problem-solving models. Its ability to handle diverse data types, offer visible reasoning, and perform at scale positions it as a serious contender in the reasoning AI market, rivaling OpenAI’s o1 family and beyond.
Daily insights on business use cases with VB Daily
If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.
Read our Privacy Policy
Thanks for subscribing. Check out more VB newsletters here.
An error occured.
GIPHY App Key not set. Please check settings