ChronoSC: Task-Oriented Semantic Communication via Temporal-to-Color Encoding
Quick Take
ChronoSC enables efficient Video Question Answering through temporal-to-color encoding, achieving significant bandwidth reduction.
Key Points
- Introduces Chrono-Color Stacking for temporal compression.
- Utilizes DeepJSCC for lightweight transmission.
- Achieves 192x bandwidth reduction with high accuracy.
📖 Reader Mode
~2 min readAbstract:Semantic communication (SC) aims to reduce transmission overhead by conveying task-relevant information rather than raw data. However, existing SC approaches for video largely focus on pixel-level reconstruction or rely on complex spatiotemporal pipelines, leading to excessive bandwidth usage and latency that are unsuitable for low-resource deployments. In this paper, we propose ChronoSC, a task-oriented semantic communication framework for Video Question Answering (VideoQA). ChronoSC introduces Chrono-Color Stacking, a lightweight and lossless projection scheme that encodes temporal video dynamics into a single static image, enabling extreme temporal compression before transmission. This compact semantic representation is transmitted using a lightweight Deep Joint Source-Channel Coding (DeepJSCC) transceiver and explicitly reconstructed at the receiver. Unlike latent-space methods, explicit visual reconstruction enables the direct reuse of pre-trained vision-language models; specifically, a pre-trained BLIP model is employed to infer answers from noisy, reconstructed chrono-images. Experiments on the CLEVRER dataset show that ChronoSC achieves up to 192 times bandwidth reduction compared to raw video transmission while maintaining high VideoQA accuracy.
| Comments: | 6 pages, IEEE ICCE 2026 |
| Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.16388 [cs.CV] |
| (or arXiv:2605.16388v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16388 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Van-Dinh Nguyen Dr [view email]
[v1]
Mon, 11 May 2026 17:29:05 UTC (704 KB)
— Originally published at arxiv.org
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.