NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule
Quick Take
NVIDIA's Gated DeltaNet-2 introduces decoupled erase and write gates for improved linear attention performance.
Key Points
- Decouples erase and write operations for better memory editing.
- Achieves state-of-the-art results in language modeling and retrieval.
- Trained on 100B FineWeb-Edu tokens with 1.3B parameters.
Article Excerpt
From source RSS / original summaryLinear attention squeezes the unbounded KV cache into a fixed-size recurrent state, but editing that memory without scrambling existing associations is hard. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to control both erasing old content and writing new content. NVIDIA's Gated DeltaNet-2 decouples these into a channel-wise erase gate b_t on the key axis and a channel-wise write gate w_t on the value axis. At 1.
3B parameters trained on 100B FineWeb-Edu tokens, it outperforms Mamba-2, Gated DeltaNet, KDA, and Mamba-3 across language modeling, commonsense reasoning, and long-context retrieval — with the largest gains on RULER S-NIAH and multi-key needle retrieval. The post NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule appeared first on MarkTechPost.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →
Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments
This tutorial guides building a Langfuse pipeline for observability and evaluation without paid model access.

