AnySimLite: A Lightweight Few-Shot Similarity Encoder for On-Device Speech-Adjacent Classification
Quick Answer
AnySimLite is a lightweight similarity encoder designed for on-device speech-adjacent classification, achieving state-of-the-art performance in few-shot settings while using less than 1/250th the model size of the qLLaMA_LoRA-7B baseline.
Quick Take
AnySimLite is a lightweight similarity encoder designed for on-device speech-adjacent classification, achieving state-of-the-art performance in few-shot settings while using less than 1/250th the model size of the qLLaMA_LoRA-7B baseline. It effectively combines word-level and character-level channels to minimize memory footprint and maintain low inference latency on edge devices.
Key Points
- AnySimLite combines word-level and character-level channels for enhanced classification.
- Achieves state-of-the-art performance in few-shot settings across multiple tasks.
- Maintains a low memory footprint, using less than 1/250th the size of qLLaMA_LoRA-7B.
- Performance drop remains below 7% even in worst-case scenarios.
- Addresses privacy concerns and inference latency on edge devices like smartphones.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 26452v1 Announce Type: new Abstract: To minimize privacy concerns and inference latency on edge devices like smartphones, lightweight on-device models remain important for end-user applications. Many of these applications involve natural language classification, but deploying multiple specialized models creates a memory footprint challenge.
We investigate: Can a single lightweight architecture solve multiple Speech-Adjacent (SA) classification tasks through reduction to a nuanced text similarity formulation? We propose AnySimLite, a lightweight similarity encoder that combines word-level and character-level channels.
Together with a dataset transformation strategy, we evaluate AnySimLite across multiple SA classification tasks and show that it consistently achieves state-of-the-art (SOTA) or SOTA-competitive performance in few-shot settings while maintaining a low memory footprint. Even in the worst case, the performance drop remains below 7% while using $<\frac{1}{250}^{\mathrm{th}}$ of the model size of the SOTA qLLaMA_LoRA-7B baseline.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.