NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

5/24/2026

·~1 min·5/24/2026·en·1

Quick Answer

NVIDIA's Gated DeltaNet-2 introduces a novel linear attention mechanism that separates erase and write functions using channel-wise gates, achieving superior performance with 1.3B parameters on 100B FineWeb-Edu tokens, surpassing models like Mamba-2 and KDA in language modeling and commonsense reasoning.

Quick Take

Key Points

Gated DeltaNet-2 uses separate erase and write gates for improved memory editing.
Achieved state-of-the-art results on RULER S-NIAH and multi-key needle retrieval.
Outperformed previous models like Mamba-2, KDA, and Mamba-3 in various benchmarks.
Trained on a massive dataset of 100 billion tokens for enhanced performance.
Utilizes 1.3 billion parameters for efficient linear attention processing.

Article Excerpt

From source RSS / original summary

Linear attention squeezes the unbounded KV cache into a fixed-size recurrent state, but editing that memory without scrambling existing associations is hard. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to control both erasing old content and writing new content. NVIDIA's Gated DeltaNet-2 decouples these into a channel-wise erase gate b_t on the key axis and a channel-wise write gate w_t on the value axis. At 1.

3B parameters trained on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and long-context retrieval — with the largest gains on RULER S-NIAH and multi-key needle retrieval. The post NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule appeared first on MarkTechPost.

Read on marktechpost.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from MarkTechPost

See more →

MarkTechPost·Asif Razzaq

3w ago

FeaturedOriginal

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

AI Summary

Flash-KMeans is an open-source, IO-aware k-means implementation that operates over 200× faster than FAISS on NVIDIA H200 GPUs. It achieves 17.9× end-to-end and 33× speedup over cuML by optimizing distance calculations and updating mechanisms without approximating results. This advancement significantly enhances performance for data scientists and machine learning practitioners.

#AI Coding #GPU #Open Source

NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

Quick Answer

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from MarkTechPost

Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs

Z.ai Launches GLM-5.2 With a Usable 1M-Token Context, Two Thinking-Effort Levels, and No Benchmarks at Launch

xAI Ships Grok Build Plugin Marketplace With MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers Plugins at Launch

Related in this space

Deploy a Production-Ready NVIDIA AI-Q Blueprint on Oracle Cloud Infrastructure

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

Deploy Self-Evolving Agents for Faster, More Secure Research with a Hermes Agent and NVIDIA NemoClaw