in

MiniMax unveils its own open source LLM with industry-leading 4M token context

MiniMax unveils its own open source LLM with industry-leading 4M token context

January 14, 2025 3:46 PM

Credit: VentureBeat made with Midjourney

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

MiniMax is perhaps today best known here in the U.S. as the Singaporean company behind Hailuo, a realistic, high-resolution generative AI video model that competes with Runway, OpenAI’s Sora and Luma AI’s Dream Machine.

But the company has far more tricks up its sleeve: Today, for instance, it announced the release and open-sourcing of the MiniMax-01 series, a new family of models built to handle ultra-long contexts and enhance AI agent development.

The series includes MiniMax-Text-01, a foundation large language model (LLM), and MiniMax-VL-01, a visual multi-modal model.

A massive context window

MiniMax-Text-o1, is of particular note for enabling up to 4 million tokens in its context window — equivalent to a small library’s worth of books. The context window is how much information the LLM can handle in one input/output exchange, with words and concepts represented as numerical “tokens,” the LLM’s own internal mathematical abstraction of the data it was trained on.

And, while Google previously led the pack with its Gemini 1.5 Pro model and 2 million token context window, MiniMax remarkably doubled that.

As MiniMax posted on its official X account today: “MiniMax-01 efficiently processes up to 4M tokens — 20 to 32 times the capacity of other leading models. We believe MiniMax-01 is poised to support the anticipated surge in agent-related applications in the coming year, as agents increasingly require extended context handling capabilities and sustained memory.”

The models are available now for download on Hugging Face and Github under a custom MiniMax license, for users to try directly on Hailuo AI Chat (a ChatGPT/Gemini/Claude competitor), and through MiniMax’s application programming interface (API), where third-party developers can link their own unique apps to them.

MiniMax is offering APIs for text and multi-modal processing at competitive rates:

$0.2 per 1 million input tokens

$1.1 per 1 million output tokens

For comparison, OpenAI’s GPT-4o costs $2.50 per 1 million input tokens through its API, a staggering 12.5X more expensive.

MiniMax has also integrated a mixture of experts (MoE) framework with 32 experts to optimize scalability. This design balances computational and memory efficiency while maintaining competitive performance on key benchmarks.

Striking new ground with Lightning Attention Architecture

At the heart of MiniMax-01 is a Lightning Attention mechanism, an innovative alternative to transformer architecture.

This design significantly reduces computational complexity. The models consist of 456 billion parameters, with 45.9 billion activated per inference.

Unlike earlier architectures, Lightning Attention employs a mix of linear and traditional SoftMax layers, achieving near-linear complexity for long inputs. SoftMax, for those like myself who are new to the concept, are the transformation of input numerals into probabilities adding up to 1, so that the LLM can approximate which meaning of the input is likeliest.

MiniMax has rebuilt its training and inference frameworks to support the Lightning Attention architecture. Key improvements include:

MoE all-to-all communication optimization: Reduces inter-GPU communication overhead.

Varlen ring attention: Minimizes computational waste for long-sequence processing.

Efficient kernel implementations: Tailored CUDA kernels improve Lightning Attention performance.

These advancements make MiniMax-01 models accessible for real-world applications, while maintaining affordability.

Performance and Benchmarks

On mainstream text and multi-modal benchmarks, MiniMax-01 rivals top-tier models like GPT-4 and Claude-3.5, with especially strong results on long-context evaluations. Notably, MiniMax-Text-01 achieved 100% accuracy on the Needle-In-A-Haystack task with a 4-million-token context.

The models also demonstrate minimal performance degradation as input length increases.

MiniMax plans regular updates to expand the models’ capabilities, including code and multi-modal enhancements.

The company views open-sourcing as a step toward building foundational AI capabilities for the evolving AI agent landscape.

With 2025 predicted to be a transformative year for AI agents, the need for sustained memory and efficient inter-agent communication is increasing. MiniMax’s innovations are designed to meet these challenges.

Open to collaboration

MiniMax invites developers and researchers to explore the capabilities of MiniMax-01. Beyond open-sourcing, its team welcomes technical suggestions and collaboration inquiries at [email protected].

With its commitment to cost-effective and scalable AI, MiniMax positions itself as a key player in shaping the AI agent era. The MiniMax-01 series offers an exciting opportunity for developers to push the boundaries of what long-context AI can achieve.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Report

What do you think?

Newbie

Written by Mr Viral

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings

Hallucinations in AI: How GSK is addressing a critical problem in drug development

Hallucinations in AI: How GSK is addressing a critical problem in drug development

On the eve of Switch 2 announcement, the game industry has a lot at stake

On the eve of Switch 2 announcement, the game industry has a lot at stake