Comment Any hope Intel may have had of challenging rivals Nvidia and AMD for a slice of the AI accelerator market dissolved on Thursday as yet another GPU architecture was scrapped.
Falcon Shores, which was due out this year and was expected to combine the best of Intel’s Xe Graphics capabilities with Gaudi’s AI grunt, will never leave the x86 giant’s labs, interim co-CEO Michelle Johnston Holthaus revealed on the corporation’s Q4 earnings call with analysts Thursday. “We plan to leverage Falcon shores as an internal test chip only, without bringing it to market.”
The decision means Intel is likely to be another year if not two out from launching its next GPU architecture, codenamed Jaguar Shores, and that assumes it doesn’t suffer the same fate as Ponte Vecchio, Rialto Bridge, and now Falcon Shores.
That’s right, this isn’t the first or even second time that development of a GPU capable of taking on Nvidia let alone AMD has been cut short by Intel. Nearly two years ago, Intel axed Rialto Bridge, the successor to its datacenter-class GPU Max chips slated to power America’s Aurora supercomputer. At least those earlier Max chips saw limited deployments by the likes of Argonne National Laboratory in the US, the UK’s Dawn super, and Germany’s SuperMUC-NG Phase 2 system.
We say limited because Intel ended up pulling the plug on GPU Max in mid-2024, presumably to focus on its Gaudi family of accelerators — more on those later — and prepare for the Falcon Shores debut.
Given this context, the demise of Falcon Shores, in some sense, felt inevitable. Intel’s roadmap had it set for a 2024 release, but that was pushed back a year around the time Rialto Bridge was binned. Back then, the Falcon Shores project included an XPU variant that combined CPU and GPU dies on a single package. In mid-2023, those plans were pared back, leaving a more traditional GPU approach. Now Falcon Shores is basically dead entirely.
So what about Gaudi?
Despite going one for three on high-end GPUs so far, Intel isn’t entirely out of the AI game just yet. The x86 player still has its Gaudi3 accelerators.Â
On paper the accelerators didn’t look half bad, when they were unveiled in April. The dedicated AI accelerator boasted 1,835 teraFLOPS of dense floating-point performance at either 8- or 16-bit precision. For compute-bound workloads commonly run at BF16, Gaudi3 boasted nearly twice the performance of Nvidia’s H100 or H200.
For memory-bound workloads, such as inference, Gaudi3 packs 128GB of HBM2e memory good for 3.7 TBps of bandwidth, enabling it to contend with larger models than Nvidia’s H100 while theoretically providing higher throughput.
Unfortunately for Intel, Gaudi3 is no longer competing with H100s. While it made its debut in early 2024, the part only began trickling out to system manufacturers late last year with general availability slated for this quarter.
That means potential buyers are now cross-shopping the part against Nvidia’s Blackwell and AMD’s MI325X systems. For training, Blackwell offers greater floating-point precision; more, faster memory; and a substantially larger scale-up domain. Meanwhile, AMD’s MI325X boasts twice the capacity, and 62 percent more memory bandwidth, giving it the edge in inferencing where capacity and bandwidth are king.
This might explain why despite then-CEO Pat Gelsinger’s insistence that Gaudi3 would drive more than $500 million in accelerator revenue in the second half of 2024, Intel fell short of that target. And that’s despite an extremely competitive price point compared to Nvidia.
There could be all kinds of reasons for this, ranging from system performance to the maturity of competing software ecosystems. However, Intel’s bigger problem is that Gaudi3 is a dead-end platform.
Its successor was supposed to be a variant of Falcon Shores that from what we understand was supposed to mesh its enormous systolic arrays with Intel’s Xe graphics architecture.
Perhaps we’ll see Gaudi3 win some ground in 2025, but given the complete lack of an upgrade path and uncertainty around Jaguar Shores, it seems unlikely many are going to take the risk when alternative platforms from chip designers with proven roadmaps and track records are available.
Intel’s shrinking place in the AI datacenter
Regardless of which GPUs or AI accelerators datacenter operators end up buying, they still need a host CPU, so Intel won’t be cut out of the AI datacenter entirely.
“We have a leading position as the host CPU for AI servers, and we continue to see a significant opportunity for CPU based inference on-prem and at the edge as AI infused applications proliferate,” Holthaus told Wall Street this week.
We continue to see a significant opportunity for CPU based inference on-prem and at the edge
Intel’s Granite Rapids Xeons launched last year have proven to be its most compelling in years, boasting core counts up to 128 cores, 256 threads, support for speedy 8,800 MT/s MRDIMMS, up to 96 lanes of PCIe 5.0 per socket.
However, this segment is getting a lot more competitive. It’s hard to ignore the gains AMD continues to make in the datacenter with its Epyc processor family. The Ryzen slinger now commands about 24.2 percent of the server CPU market, according to Mercury Research.
Meanwhile, Nvidia, which is a long-time Intel partner having used its CPUs in several generations of DGX reference designs, is increasingly relying on its Arm-based Grace processors for its top-specced accelerators. Nv still supports the HGX form-factor with eight GPUs per system that we’ve grown accustomed to, and so Intel still can win share in this arena — for now.
But with, AMD making a point of how well optimized its Turin-generation of CPUs are for GPU servers, we anticipate vendors will gravitate to some degree to all-AMD configurations with Epyc and Instinct for their builds, further inhibiting Intel’s ability to compete in this space
Opportunities at the edge
Intel’s opportunities to capitalize on the AI boom may be shrinking in the datacenter, Chipzilla still has a shot at the network edge and on the PC.
Like most personal computer hardware makers, Intel has been banging the AI PC drum even before Microsoft spilled the beans on its 40 TOPS Copilot+ performance requirements.Â
And while this led to a somewhat awkward moment in which Qualcomm was, for a few months, the sole supplier of Copilot+ compatible processors, both AMD and Intel were able to catch up with the launch of Strix Point and Lunar Lake in July and September, respectively.
As we explored at Computex, Lunar Lake boasts a 48 TOPS NPU alongside a GPU and CPU, and Intel claims the system-on-chips can deliver 120 total system TOPS between the three.
However, more importantly for Intel, it still controls the lion’s share of the CPU market for PCs.
And while just how important these AI features will ultimately be for PC customers is still up for debate, and Intel faces stiff competition from AMD, Qualcomm, and Nvidia at the higher-end of the PC spectrum, it’s squarely in the race.
Along with the emerging AI PC market, Intel’s CPU strategy could help it secure wins at the network edge where it can flex the Advanced Matrix Extensions (AMX) compute blocks that have been baked into its CPUs going back to Sapphire Rapids to run machine-learning and generative-AI workloads without the need for a GPU.
Intel has previously demonstrated 4-bit quantized 70-billion-parameter LLMs running at reasonable 12 tokens a second on its Granite Rapids Xeons, thanks to its MRDIMM memory support.
Extrapolating this performance out, we’d expect to see generation rates of around 100 tokens a second for an 8-billion-parameter model, at least for a batch size of one. As we’ve previously explored in detail, the economics of CPU-only AI still aren’t great with batch size being one of the limiting factors.
But, for a network edge appliance which might only need to run models periodically, this not only wouldn’t be a problem, but it’d potentially help to eliminate complexity and points of failure compared to GPU-based solutions.
Intel sinks $19B into the red, kills Falcon Shores GPUs, delays Clearwater Forest Xeons
Want Intel in your Surface? That’ll be $400 extra, says Microsoft
You begged Microsoft to be reasonable. Instead it made Copilot reasoning-able with OpenAI GPT-o1
DeepSeek means companies need to consider AI investment more carefully
Don’t count a comeback out just yet
If the rebirth of AMD in the post-Bulldozer era teaches us anything, it’s not to count an Intel come back out.
When Ryzen and Epyc made their debut in the late 2010s, the parts weren’t the most performant, but they were differentiated, offering customers something they couldn’t get from Intel: Loads of cheap good-enough cores.
In the GPU space, AMD employed a similar strategy, first focusing on delivering better performance in high-performance computing (HPC) applications than Nvidia. This helped AMD secure several high profile wins for its Instinct accelerators with America’s Frontier and more recently El Capitan supercomputers.
With its MI300-series accelerators and the pivot to AI, AMD differentiated again, targeting higher memory capacities than Nvidia could offer. This helped it secure wins from major hyperscalers and cloud providers, such as Microsoft and Meta, who were trying to reduce the cost of memory-bound workloads including inference.
We bring this up because the decision to scrap Falcon Shores presents Intel an opportunity to start afresh and build something unencumbered by architectural decisions no longer representative of what the market actually wants.
The decision to refocus Jaguar Shores toward a rack-scale design is a promising sign of what’s to come. If Intel can find a way to differentiate its next GPU and provide something customers want but simply can’t get from its competitors, it at least stands a chance of reestablishing a foothold in the datacenter. ®
GIPHY App Key not set. Please check settings