Intra-Modal Neighbors Never Lie: Rectifying Inter-Modal Noisy Correspondence via Graph-Based Intra-Modal Reasoning
Quick Take
The Intra-modal Neighbor-aware Noise Rectification (IN2R) framework addresses noisy correspondence in cross-modal retrieval by synthesizing reliable supervision targets instead of relying on discrete labels. It leverages intra-modal data stability and a Graph Refiner to enhance model generalization, outperforming state-of-the-art methods on benchmarks like Flickr30K, MS-COCO, and CC152K.
Key Points
- IN2R synthesizes soft prototypes reflecting local semantic consensus.
- The framework utilizes a dynamic Cross-Model Memory for relational reasoning.
- Extensive experiments show significant performance improvements over existing methods.
- Publicly available code and pre-trained models can be accessed on GitHub.
Article Content
From source RSS / original summaryarXiv:2606. 04061v1 Announce Type: new Abstract: Large-scale web-harvested datasets have fueled the progress of cross-modal retrieval but inevitably suffer from noisy correspondence, which severely degrades model generalization. Existing methods primarily address this by filtering out noise or seeking a substitute label, yet they predominantly remain bound by a "Discrete Selection" paradigm. We argue that relying on a single discrete proxy induces Single-Point Fragility and Discretization Error.
To overcome these limitations, we propose a novel framework, Intra-modal Neighbor-aware Noise Rectification (IN2R), which shifts the paradigm from searching for a substitute to synthesizing a reliable supervision target. Leveraging the intrinsic geometric stability of intra-modal data, IN2R employs a Graph Refiner to perform relational reasoning over neighbors retrieved from a dynamic Cross-Model Memory.
Instead of propagating discrete labels, our method synthesizes a continuous, soft prototype that reflects the consensus of the local semantic neighborhood, effectively rectifying inter-modal misalignment. Extensive experiments on Flickr30K, MS-COCO, and CC152K demonstrate that IN2R significantly outperforms state-of-the-art methods. Our code and pre-trained models are publicly available at https://github. com/liuyyy111/IN2R.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Optimal Transport Flow Matching by Design
The study presents a novel approach to optimal transport (OT) flow matching, reformulating the problem by treating the prior as a design choice. This method achieves over 2x reduction in trajectory curvature compared to existing methods, improving generation quality in few-step regimes without altering the flow model. The approach integrates seamlessly with latent-space models and classifier-free guidance.