
Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch
Quick Take
Parallax introduces a learned projector to replace LLA's per-query solver, achieving double the arithmetic intensity and enhancing perplexity at 0.6B and 1.7B parameters. This advancement significantly impacts model efficiency and performance in local linear attention mechanisms.
Key Points
- Parallax replaces LLA's per-query solver with a learned projector.
- Achieves double the arithmetic intensity compared to previous models.
- Improves perplexity metrics at 0.6B and 1.7B parameters.
- Enhances local linear attention mechanisms significantly.
- Targets improvements in model efficiency and performance.
Article Excerpt
From source RSS / original summaryParallax replaces LLA's per-query solver with a learned projector, doubling arithmetic intensity and improving perplexity at 0. 6B and 1. 7B. The post Parallax: A Parameterized Local Linear Attention That Keeps Softmax and Adds a Learned Covariance Correction Branch appeared first on MarkTechPost.
Reader Mode unavailable (the site blocks scraping).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from MarkTechPost
See more →Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughput Gain
Trajectory, in collaboration with UC Berkeley Sky Lab and Anyscale, has developed a concurrent multi-LoRA training stack that enhances continual learning, achieving a 2.81× throughput gain compared to single-tenant setups without reward regression. The open-source code is available in NovaSky-AI/SkyRL.

