Context-Aware Feature-Fusion for Co-occurring Object Detection in Autonomous Driving

arXiv cs.CV·Binay Kumar Singh, Niels Da Vitoria Lobo

1d ago

·~2 min·6/12/2026·en·4

Quick Answer

This paper shows that The Context-Centric Feature Fusion (CCFF) framework enhances co-occurring object detection in autonomous driving by integrating Local Context Fusion and Global Context Attention modules, achieving a Category-level Consistency Strategy of 0.973 on Cityscapes and 0.969 on BDD100K.

Quick Take

The Context-Centric Feature Fusion (CCFF) framework enhances co-occurring object detection in autonomous driving by integrating Local Context Fusion and Global Context Attention modules, achieving a Category-level Consistency Strategy of 0.973 on Cityscapes and 0.969 on BDD100K. It significantly improves small object detection by 14.1% and processes images in real-time with minimal overhead.

Key Points

CCFF uses attention-based modules for improved spatial interactions and object detection.
Achieved 0.973 and 0.969 consistency on Cityscapes and BDD100K datasets.
Significant 14.1% improvement in small object detection performance.
Effectively recovers rare classes like 'Train' often lost in large distributions.
Processes images in real-time with only a 0.2 FPS overhead.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 12628v1 Announce Type: new Abstract: Object detection in autonomous driving requires precise localization and an inherent understanding of the relational context between co-occurring objects. In extremely complex heterogeneous environments rare classes, small-scale objects, and frequently appearing objects are difficult for standard object detection frameworks to handle.

In this paper, we propose a novel framework called Context-Centric Feature Fusion (CCFF), which utilizes two attention-based modules, Local Context Fusion Module (LCFM) uses the RoI-to-RoI self-attention mechanism to resolve spatial interactions, mainly considering small and partially obscured objects, while Global Context Attention Module (GCAM) converts the co-occurrence of objects priors by pooling top-K RoI features into a global context attention token, avoiding the computational overhead of pixel-level global pooling.

This fusion of local and object-centric global features yields contextualized embeddings that enhance classification results and co-occurring objects detection. Our method is evaluated on two datasets, Cityscapes and BDD100K which demonstrate significant improvement on relational consistency, achieving a Category-level Consistency Strategy (CCS) of 0. 973 and 0. 969, respectively. Furthermore, our approach produces substantial gains in small object detection (AP_S: 14.

1%) and successfully recovers rare classes such as "Train" that are typically lost in large distributions. Our efficiency report shows that the framework processes images in real time with a 0. 2 FPS overhead. The code is available at https://github. com/BinayKSingh/CCFF.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup