Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

arXiv cs.CV·Muhammad Shahbaz, Shaurya Agarwal

1d ago

·~1 min·6/12/2026·en·0

Quick Answer

This paper shows that The Camera and LiDAR fusion detector for TUMTraf V2X achieves a 3D mAP of 0.85, improving to 0.99 with post-processing.

Quick Take

The Camera and LiDAR fusion detector for TUMTraf V2X achieves a 3D mAP of 0.85, improving to 0.99 with post-processing. It utilizes a CenterPoint-style head and IoU regression loss, trained on overlapping frames from the dataset.

Key Points

Fuses three roadside cameras with a vehicle point cloud in a bird's-eye view.
Achieved 3D mAP of 0.85 on the public Codabench test split.
Fine-tuning on overlapping frames improved mAP to 0.89.
Post-processing with ground truth predictions reached 0.99 mAP.
All configurations and per-class results are reported.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 12981v1 Announce Type: new Abstract: We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head.

Trained on the provided train and validation splits, the model reaches a 3D mAP of 0. 85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.

89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0. 99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup