SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks
Quick Take
SaluNet introduces a novel learnable activation mechanism, SALU, replacing normalization layers in deep networks, achieving 97.35% on CIFAR-10 with ResNet-18. This approach enhances adaptability, showing significant performance improvements over traditional methods, particularly in low batch sizes.
Key Points
- SaluNet replaces normalization layers with SALU, enhancing total plasticity in networks.
- ResNet-18 achieves 97.35% on CIFAR-10 without normalization, outperforming traditional methods.
- SaluNet-T improves CIFAR-10 accuracy from 90.92% to 91.01% over LayerNorm-GELU.
- SaluNet-C-50 reaches 78.67% Top-1 accuracy on ImageNet-1K at 224x224 resolution.
- The findings suggest normalization layers hinder the adaptability of deep networks.
Article Content
From source RSS / original summaryarXiv:2606. 02927v1 Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.
Motivated by this observation, we introduce SALU (Saturated Adaptive Linear Unit), \[ \operatorname{SALU}(x;a,b) = \frac{a x}{\sqrt{1 + a b x^2}},\quad a>0,\; b>0 \] a bounded, learnable activation that provides intrinsic signal stabilization without relying on batch statistics or external affine parameters. Building on SALU, we propose SaluNet, a paradigm grounded in total plasticity: SALU replaces normalization layers, while SWALU and GALU replace standard activations. With ResNet-18, SaluNet-C-18 achieves 97.
35\% on CIFAR-10 and 83. 25\% on CIFAR-100 without normalization, maintaining 93. 44\% and 76. 23\% at batch size 1 where normalized architectures fail. For transformers, SaluNet-T improves over LayerNorm-GELU from 90. 92\% to 91. 01\% on CIFAR-10 and from 66. 54\% to 68. 10\% on CIFAR-100. SaluNet-C-50 reaches 78. 67\% Top-1 on ImageNet-1K at $224\times224$, and $79. 23\%$ at $288\times288$.
These results suggest normalization layers suppress total plasticity, a property biological neurons inherently possess, enabling deep networks to learn effectively.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Plan2Map: A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records
Plan2Map introduces a 208-case benchmark for reconstructing geospatial boundaries from UK planning documents. The GeoPlanAgent system achieves a mean IoU of 0.736, significantly outperforming baseline models, highlighting the challenges in localization and map registration.