Vector Linking via Cross-Model Local Isometric Consistency
Quick Take
The study introduces Vector Linking, a method for recovering cross-model object correspondences using local geometric consistency from independently trained contrastive encoders. The proposed iterative geometric embedding hashing shows robust linking performance across various benchmarks, even with limited seed data and out-of-domain anchors, enhancing applications in vector database integration and cross-model clustering.
Key Points
- Vector Linking recovers correspondences from different black-box encoders over overlapping datasets.
- Local geometric consistency allows for accurate short-range distance preservation.
- Iterative geometric embedding hashing uses paired anchors to propose candidate links.
- Experiments show robust linking under varying overlap and seed budgets.
- Code available at GitHub for further implementation and testing.
Article Content
From source RSS / original summaryarXiv:2605. 31100v1 Announce Type: new Abstract: We study Vector Linking: given two embedding clouds produced by different black-box encoders over partially overlapping datasets, recover cross-model object correspondences using only vectors. Empirically and theoretically, we show that independently trained contrastive encoders exhibit local geometric consistency: short-range distances are approximately preserved up to a scale factor, while long-range distances are not due to model-specific distortion.
Building on this, we propose an iterative, reference-based geometric embedding hashing that recovers vector links from a tiny seed set of paired anchors. It represents each vector by distances to sampled paired anchors, proposes candidate links via hash-space matching, and aggregates evidence across views in a Beta-Bernoulli posterior to bootstrap high-confidence links as new anchors.
Experiments across multiple benchmarks and embedding model pairs demonstrate accurate and robust linking under varying overlap, seed budgets, and out-of-domain anchors, with applications to vector database integration and cross-model clustering. Code is available at https://github. com/DBgroup-Edinburgh/VecLinking.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane
The Redpanda Agentic Data Plane (ADP) introduces out-of-band metadata channels to enhance the safety of autonomous AI agents, ensuring secure data access and tamper-proof audit trails. This architecture mitigates risks associated with unpredictable AI behavior by enforcing governance throughout the agent lifecycle, demonstrated in a multi-agent trading system with strict data scoping and approval thresholds.