Residual Modeling for High-Fidelity Learned Compression of Scientific Data
Quick Answer
The proposed residual-centric approach introduces two coders, LBRC and NGLR, enhancing compression ratios by 30-60% over GAE and outperforming SZ in high-fidelity regimes.
Quick Take
The proposed residual-centric approach introduces two coders, LBRC and NGLR, enhancing compression ratios by 30-60% over GAE and outperforming SZ in high-fidelity regimes. LBRC adapts to target NRMSE using deterministic methods, while NGLR incorporates a neural predictor to further reduce residual code entropy. These advancements are crucial for efficient lossy compression of scientific data.
Key Points
- LBRC improves compression ratios by 30-60% over Guaranteed Autoencoder (GAE).
- NGLR adds an additional 10-40% compression improvement over LBRC.
- Both methods target block-level NRMSE from 10^-6 to 10^-4.
- Residual representations tailored to learned-compressor residuals enhance performance.
- Results are validated across datasets like E3SM, JHTDB, and ERA5.
Article Content
From source RSS / original summaryarXiv:2606. 05389v1 Announce Type: new Abstract: Lossy compression is essential for massive spatiotemporal data from scientific simulations. Learned compressors can achieve high compression ratios at moderate accuracy targets, but their aggregate reconstruction losses do not guarantee accuracy for each block. Existing Guaranteed Autoencoder (GAE) methods add a per-block residual correction by retaining SVD/PCA-style coefficients until the target is met.
This works at moderate tolerances, but in the high-fidelity regime with block-level NRMSE from 10^-6 to 10^-4, the number of retained coefficients grows quickly and the correction stream dominates the total rate. We propose a residual-centric view: the learned residual is structurally different from the original scientific field and should be coded with a representation designed for that residual. We introduce two residual coders.
LBRC is a deterministic, training-free pipeline that adaptively quantizes the learned residual to the target NRMSE and losslessly encodes the resulting integer residual using 3D Lorenzo differencing, zigzag mapping, bit-plane coding, and entropy coding. NGLR adds a causal neural predictor that outputs a normalized bias for an integer-rounded Lorenzo prediction in the same deterministic integer pipeline, reducing the entropy of the remaining residual code while preserving deterministic decoding.
The predictor weights are serialized and counted in the bitstream. Across E3SM, JHTDB, and ERA5 at block-level NRMSE targets from 10^-6 to 10^-4, LBRC improves compression ratio over GAE by 30-60% and is broadly competitive with SZ. NGLR adds a further 10-40% over LBRC and outperforms SZ in the evaluated high-fidelity regime.
These results show that residual representations tailored to learned-compressor residuals can preserve the advantage of learned compression when global residual correction becomes rate-dominant.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.