Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving
Quick Answer
This paper shows that The Context-Centric Feature Fusion (CCFF) framework enhances co-occurring object detection in autonomous driving by integrating Local Context Fusion and Global Context Attention modules, achieving a Category-level Consistency Strategy of 0.973 on Cityscapes and 0.969 on BDD100K.
Quick Take
The Context-Centric Feature Fusion (CCFF) framework enhances co-occurring object detection in autonomous driving by integrating Local Context Fusion and Global Context Attention modules, achieving a Category-level Consistency Strategy of 0.973 on Cityscapes and 0.969 on BDD100K. It significantly improves small object detection by 14.1% and processes images in real-time with minimal overhead.
Key Points
- CCFF uses attention-based modules for improved spatial interactions and object detection.
- Achieved 0.973 and 0.969 consistency on Cityscapes and BDD100K datasets.
- Significant 14.1% improvement in small object detection performance.
- Effectively recovers rare classes like 'Train' often lost in large distributions.
- Processes images in real-time with only a 0.2 FPS overhead.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 12628v1 Announce Type: new Abstract: Object detection in autonomous driving requires precise localization and an inherent understanding of the relational context between co-occurring objects. In extremely complex heterogeneous environments rare classes, small-scale objects, and frequently appearing objects are difficult for standard object detection frameworks to handle.
In this paper, we propose a novel framework called Context-Centric Feature Fusion (CCFF), which utilizes two attention-based modules, Local Context Fusion Module (LCFM) uses the RoI-to-RoI self-attention mechanism to resolve spatial interactions, mainly considering small and partially obscured objects, while Global Context Attention Module (GCAM) converts the co-occurrence of objects priors by pooling top-K RoI features into a global context attention token, avoiding the computational overhead of pixel-level global pooling.
This fusion of local and object-centric global features yields contextualized embeddings that enhance classification results and co-occurring objects detection. Our method is evaluated on two datasets, Cityscapes and BDD100K which demonstrate significant improvement on relational consistency, achieving a Category-level Consistency Strategy (CCS) of 0. 973 and 0. 969, respectively. Furthermore, our approach produces substantial gains in small object detection (AP_S: 14.
1%) and successfully recovers rare classes such as "Train" that are typically lost in large distributions. Our efficiency report shows that the framework processes images in real time with a 0. 2 FPS overhead. The code is available at https://github. com/BinayKSingh/CCFF.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.