Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection

arXiv cs.CV·Borja Carrillo-Perez (Arquimea Research Center)

5d ago

·~1 min·5/25/2026·en·0

Quick Take

This study enhances the DETR-based fusion transformer for the MaCVi 2026 challenge by introducing a dedicated MLP, QueryMLP, to predict buoy waterline contact points. This modification improved the model's performance, achieving an Overall score of 0.7386, F1 score of 0.8055, and mIoU of 0.6718, securing second place in the leaderboard.

Key Points

Introduced QueryMLP to predict buoy waterline contact points from chart and IMU data.
Reduced geometric reasoning burden on the transformer decoder with explicit spatial priors.
Achieved an Overall score of 0.7386 in the MaCVi 2026 challenge leaderboard.
F1 score reached 0.8055 and mIoU was 0.6718 on the held-out test set.
Secured second place among all submissions in the competition.

Article Excerpt

From source RSS / original summary

arXiv:2605. 22942v1 Announce Type: new Abstract: This report presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. The challenge baseline decoder receives per-buoy queries encoding world-space distance and bearing, forcing the transformer to implicitly learn the complex geometric projection from world coordinates to image pixels.

Instead, this work trains an additional dedicated MLP, QueryMLP, to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. On the challenge leaderboard, the presented approach achieves an Overall score of 0. 7386, with F1 = 0. 8055 and mIoU = 0.

6718, on the held-out test set, placing second among all submissions.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

Improved Vision-to-Chart Buoy Association with Learned World-to-Image Projection

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities