NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents
Quick Answer
NVIDIA has launched Nemotron 3 Ultra, a 550B Mixture-of-Experts hybrid Mamba-Transformer that offers 1M-token context and up to 6x higher inference throughput than similar open LLMs, while maintaining comparable accuracy.
Quick Take
NVIDIA has launched Nemotron 3 Ultra, a 550B Mixture-of-Experts hybrid Mamba-Transformer that offers 1M-token context and up to 6x higher inference throughput than similar open LLMs, while maintaining comparable accuracy. It comes with open weights and training data under OpenMDW-1.1, targeting long-running agents.
Key Points
- Nemotron 3 Ultra features 550B total parameters with 55B active.
- Delivers up to 6x higher inference throughput than comparable models.
- Supports a context length of 1 million tokens.
- Includes open weights and training data for broader accessibility.
- Designed specifically for long-running AI agents.
Article Excerpt
From source RSS / original summaryNVIDIA has released Nemotron 3 Ultra, a 550B total (55B active) open Mixture-of-Experts hybrid Mamba-Transformer for long-running agents. It pairs a 1M-token context with up to ~6x higher inference throughput than comparable open LLMs at on-par accuracy, and ships with open weights, training data, and recipes under OpenMDW-1. 1. The post NVIDIA AI Releases Nemotron 3 Ultra: An Open 550B Mixture-of-Experts Hybrid Mamba-Transformer for Long-Running Agents appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal
Google has launched the Colab CLI, enabling developers and AI agents to execute Python code on remote Colab GPUs and TPUs directly from the terminal. This new tool enhances workflow efficiency by allowing local code execution in a cloud environment, streamlining the development process for machine learning applications.