NVIDIA AI Releases Nemotron 3 Ultra | AI Deep Signal

NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents

6/4/2026

·~9 min·6/4/2026·en·2

Quick Answer

NVIDIA has launched Nemotron 3 Ultra, a 550B Mixture-of-Experts hybrid Mamba-Transformer that offers 1M-token context and up to 6x higher inference throughput than similar open LLMs, while maintaining comparable accuracy.

Quick Take

It comes with open weights and training data under OpenMDW-1.1, targeting long-running agents.

Key Points

Nemotron 3 Ultra features 550B total parameters with 55B active.
Delivers up to 6x higher inference throughput than comparable models.
Supports a context length of 1 million tokens.
Includes open weights and training data for broader accessibility.
Designed specifically for long-running AI agents.

Source Excerpt

NVIDIA has released Nemotron 3 Ultra, a 550B total (55B active) open Mixture-of-Experts hybrid Mamba-Transformer for long-running agents. It pairs a 1M-token context with up to ~6x higher inference throughput than comparable open at on-par accuracy, and ships with open weights, training data, and recipes under OpenMDW-1. 1.

Read the full article on marktechpost.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from MarkTechPost

See more →

MarkTechPost·Asif Razzaq

6/15/2026

FeaturedOriginal

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

AI Summary

Flash-KMeans is an open-source, IO-aware k-means implementation that operates over 200× faster than FAISS on NVIDIA H200 GPUs. It achieves 17.9× end-to-end and 33× speedup over cuML by optimizing distance calculations and updating mechanisms without approximating results. This advancement significantly enhances performance for data scientists and machine learning practitioners.

#AI Coding #GPU #Open Source