SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks

arXiv cs.CV·Mourad Zaied (University of Gabes, Tuisia)

6/3/2026

·~2 min·6/3/2026·en·1

Quick Answer

SaluNet introduces a novel learnable activation mechanism, SALU, replacing normalization layers in deep networks, achieving 97.35% on CIFAR-10 with ResNet-18.

Quick Take

SaluNet introduces a novel learnable activation mechanism, SALU, replacing normalization layers in deep networks, achieving 97.35% on CIFAR-10 with ResNet-18. This approach enhances adaptability, showing significant performance improvements over traditional methods, particularly in low batch sizes.

Key Points

SaluNet replaces normalization layers with SALU, enhancing total plasticity in networks.
ResNet-18 achieves 97.35% on CIFAR-10 without normalization, outperforming traditional methods.
SaluNet-T improves CIFAR-10 accuracy from 90.92% to 91.01% over LayerNorm-GELU.
SaluNet-C-50 reaches 78.67% Top-1 accuracy on ImageNet-1K at 224x224 resolution.
The findings suggest normalization layers hinder the adaptability of deep networks.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 02927v1 Announce Type: new Abstract: Normalization layers such as BatchNorm and LayerNorm have long been considered essential for stable training in deep networks. This work demonstrates that they can be fully replaced by a single learnable activation mechanism. We identify a plasticity suppression effect induced by standard normalization: learnable activation parameters rapidly lose adaptability when paired with normalization layers.

Motivated by this observation, we introduce SALU (Saturated Adaptive Linear Unit), \[ \operatorname{SALU}(x;a,b) = \frac{a x}{\sqrt{1 + a b x^2}},\quad a>0,\; b>0 \] a bounded, learnable activation that provides intrinsic signal stabilization without relying on batch statistics or external affine parameters. Building on SALU, we propose SaluNet, a paradigm grounded in total plasticity: SALU replaces normalization layers, while SWALU and GALU replace standard activations. With ResNet-18, SaluNet-C-18 achieves 97.

35\% on CIFAR-10 and 83. 25\% on CIFAR-100 without normalization, maintaining 93. 44\% and 76. 23\% at batch size 1 where normalized architectures fail. For transformers, SaluNet-T improves over LayerNorm-GELU from 90. 92\% to 91. 01\% on CIFAR-10 and from 66. 54\% to 68. 10\% on CIFAR-100. SaluNet-C-50 reaches 78. 67\% Top-1 on ImageNet-1K at $224\times224$, and $79. 23\%$ at $288\times288$.

These results suggest normalization layers suppress total plasticity, a property biological neurons inherently possess, enabling deep networks to learn effectively.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Aavash Chhetri, Bibek Niroula, Eduard Vazquez, Yash Raj Shrestha, Prashnna Gyawali, Loris Bazzani, Binod Bhattarai

1w ago

FeaturedOriginal

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

AI Summary

ProMoE-FL introduces a Prototype-conditioned Mixture-of-Experts framework for multimodal federated learning, effectively addressing missing modalities. It outperforms existing methods on four chest X-ray datasets, demonstrating superior feature synthesis capabilities in both homogeneous and heterogeneous settings.

#LLM #AI Coding #AI Startup #Enterprise AI

SaluNet: Enabling Total Plasticity in Normalization-Free Deep Networks

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

-Guided ANN Index Optimization for Human-Object Interaction Retrieval

SeeSE3: Emergence of 3D Space in Vision Features

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

ProMoE-FL: Prototype-conditioned Mixture of Experts for Multimodal Federated Learning with Missing Modalities

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

SeeSE3: Emergence of 3D Space in Vision Features

-Guided ANN Index Optimization for Human-Object Interaction Retrieval