OpenAI and Broadcom unveil "Jalapeño," a custom chip built for LLM inference

The Decoder·Maximilian Schreiner

1d ago

·~2 min·6/24/2026·en·0

Quick Answer

OpenAI, in collaboration with Broadcom, has introduced the 'Jalapeño' chip, specifically designed for large language model (LLM) inference.

Quick Take

OpenAI, in collaboration with Broadcom, has introduced the 'Jalapeño' chip, specifically designed for large language model (LLM) inference. This custom hardware aims to enhance performance and scalability, with plans for deployment by late 2026.

Key Points

The 'Jalapeño' chip is optimized for large language model inference.
Developed in partnership with Broadcom, it aims for high scalability.
Deployment is expected by late 2026, enhancing OpenAI's tech stack.

📖 Reader Mode

~2 min read

Maximilian Schreiner

OpenAI is adding custom hardware to its tech stack. The "Jalapeño" chip, developed with Broadcom, is tailored for large language model inference and is set to run at scale by late 2026.

According to a joint announcement, OpenAI and Broadcom have unveiled "Jalapeño" - OpenAI's first so-called "Intelligence Processor." It's a custom accelerator built specifically for large language model inference, and the first chip in a multi-generation platform the two companies are building together.

Broadcom CEO Hock Tan and President Charlie Kawwas handed the first wafer to OpenAI CEO Sam Altman and President Greg Brockman. For OpenAI, this marks its first step into custom hardware after years of focusing on models and products.

OpenAI says Jalapeño isn't a modified general-purpose chip. It was designed from scratch for modern LLM inference. OpenAI handles the chip design, Broadcom contributes silicon manufacturing and networking technology including its Tomahawk networking chips, and Celestica takes care of boards, racks, and system integration.

Performance claims lack independent verification

Early tests showed performance per watt that's "substantially better" than current state-of-the-art hardware, according to OpenAI. These are self-reported numbers that haven't been finalized. Take them with a grain of salt. A technical report is supposed to follow. Right now, it's unclear which chips Jalapeño was tested against, on what tasks, and under what conditions.

The architecture reportedly cuts data movement and pushes utilization closer to its theoretical max. Engineering samples are already running ML workloads in the lab, including the GPT-5.3-Codex-Spark model. That model currently runs on Cerebras hardware, which also specializes in inference.

OpenAI says the process from design to tape-out took just nine months, what the company calls the fastest ASIC development cycle for high-performance semiconductors it's aware of. OpenAI's own models helped speed up parts of the design process. The rumors about chip plans, though, have been circulating since 2023.

The announcement reflects OpenAI's argument that controlling the full stack from chip to product lets it run models faster, more reliably, and at lower cost. Broadcom CEO Tan says the first deployment is planned for late 2026 at gigawatt scale, together with Microsoft and other partners. Broadcom has reportedly demanded that Microsoft guarantee it will buy 40 percent of the chips to secure the first phase.

— Originally published at the-decoder.com

Continue reading on the-decoder.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from The Decoder

See more →

The Decoder·Maximilian Schreiner

2d ago

FeaturedOriginal

Cursor announces its own AI model, a new Git platform, and a mobile app

AI Summary

Cursor has launched its first in-house AI model alongside a new Git platform and a mobile app, aiming to enhance developer productivity. The AI model is designed to streamline coding processes, while the Git platform offers improved version control features tailored for collaborative projects.

#LLM #AI Coding #Open Source #AI Startup

OpenAI and Broadcom unveil "Jalapeño," a custom chip built for LLM inference

Quick Answer

Quick Take

Key Points

📖 Reader Mode

Performance claims lack independent verification

Want this in your inbox every morning?

More from The Decoder

Cursor announces its own AI model, a new Git platform, and a mobile app

OpenAI models now available on Amazon Web Services

Microsoft and Nvidia reportedly team up on AI PCs that run actual agents instead of Copilot

Related in this space

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw

Run Local AI Agents with Faster Models and Multi-Node Clustering on NVIDIA DGX Spark