Active Adversarial Perturbation-driven Associative Memory Retrieval for RGB-Event Visual Object Tracking
Quick Answer
APRTrack introduces a hierarchical perturbation and retrieval framework for RGB-Event visual object tracking, enhancing robustness against partial target loss and modal degradation.
Quick Take
APRTrack introduces a hierarchical perturbation and retrieval framework for RGB-Event visual object tracking, enhancing robustness against partial target loss and modal degradation. The model utilizes adversarial perturbation to simulate real-world signal corruption and employs Footprint-guided Channel-calibrated Hopfield Retrieval for effective historical information compensation. Extensive experiments on multiple datasets demonstrate its effectiveness in challenging tracking scenarios.
Key Points
- APRTrack addresses challenges in RGB-Event tracking with a novel hierarchical framework.
- The model simulates signal corruption using adversarial perturbations at modality and spatial levels.
- Footprint-guided retrieval enhances historical feature compensation for improved tracking accuracy.
- Extensive testing on FE108, COESOT, VisEvent, and FELT datasets validates the approach.
- Source code and pre-trained models will be available on GitHub.
Paper Resources
📖 Reader Mode
~2 min readAbstract:RGB-Event tracking improves localization robustness by fusing RGB appearance textures and dense temporal motion cues from event sensors. While this multi-modal scheme broadens tracking applicability, real-world scenes suffer diverse structured signal degradations that hinder traditional multi-modal fusion. In harsh environments, either modality can lose reliability drastically, and targets frequently appear incomplete due to occlusion, edge truncation and foreground this http URL tackle the above challenges, we present a hierarchical perturbation and retrieval framework tailored for RGB-Event tracking with robustness against partial target missing and modal degradation, termed APRTrack. To mimic real-world signal corruption, APRTrack constructs structured degradation via two adversarial perturbation branches at the modality and spatial levels, which separately simulate full-modal failure and localized target region absence. A hierarchical routing mechanism is designed to disentangle the training pipelines of the two perturbation types, effectively eliminating feature collapse induced by superimposed degradation constraints. Furthermore, we devise Footprint-guided Channel-calibrated Hopfield Retrieval (FCHR) for reliable historical information compensation. This module evaluates retrieval confidence based on association footprints between queries and memory banks, and calibrates the retrieval metric space prior to Hopfield matching, realizing controllable historical feature compensation bounded to target regions. Extensive experiments on FE108, COESOT, VisEvent, and FELT datasets demonstrate the effectiveness of our proposed strategies for the RGB-Event visual object tracking. The source code and pre-trained models will be released on this https URL
| Subjects: | Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG) |
| Cite as: | arXiv:2606.26455 [cs.CV] |
| (or arXiv:2606.26455v1 [cs.CV] for this version) | |
| https://doi.org/10.48550/arXiv.2606.26455 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Xiao Wang [view email]
[v1]
Wed, 24 Jun 2026 23:34:06 UTC (2,654 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.