Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning

arXiv cs.CV·Yang Liu, Wentao Feng, Shu-Dong Huang, Yalan Ye, Jiancheng Lv

3h ago

·~1 min·6/4/2026·en·0

Quick Take

The Intra-modal Neighbor-aware Noise Rectification (IN2R) framework addresses noisy correspondence in cross-modal retrieval by synthesizing reliable supervision targets instead of relying on discrete labels. It leverages intra-modal data stability and a Graph Refiner to enhance model generalization, outperforming state-of-the-art methods on benchmarks like Flickr30K, MS-COCO, and CC152K.

Key Points

IN2R synthesizes soft prototypes reflecting local semantic consensus.
The framework utilizes a dynamic Cross-Model Memory for relational reasoning.
Extensive experiments show significant performance improvements over existing methods.
Publicly available code and pre-trained models can be accessed on GitHub.

Article Content

From source RSS / original summary

arXiv:2606. 04061v1 Announce Type: new Abstract: Large-scale web-harvested datasets have fueled the progress of cross-modal retrieval but inevitably suffer from noisy correspondence, which severely degrades model generalization. Existing methods primarily address this by filtering out noise or seeking a substitute label, yet they predominantly remain bound by a "Discrete Selection" paradigm. We argue that relying on a single discrete proxy induces Single-Point Fragility and Discretization Error.

To overcome these limitations, we propose a novel framework, Intra-modal Neighbor-aware Noise Rectification (IN2R), which shifts the paradigm from searching for a substitute to synthesizing a reliable supervision target. Leveraging the intrinsic geometric stability of intra-modal data, IN2R employs a Graph Refiner to perform relational reasoning over neighbors retrieved from a dynamic Cross-Model Memory.

Instead of propagating discrete labels, our method synthesizes a continuous, soft prototype that reflects the consensus of the local semantic neighborhood, effectively rectifying inter-modal misalignment. Extensive experiments on Flickr30K, MS-COCO, and CC152K demonstrate that IN2R significantly outperforms state-of-the-art methods. Our code and pre-trained models are publicly available at https://github. com/liuyyy111/IN2R.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shimon Malnick, Matan Rusanovsky, Ohad Fried, Shai Avidan

3h ago

Original

Optimal Transport Flow Matching by Design

AI Summary

The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.

#AI Coding #Inference #Open Source