Hyperdimensional computing for structured querying on tabular data embeddings
Quick Answer
The study introduces HyperDimensional Computing (HDC) using Holographic Reduced Representations (HRR) for tabular data embeddings, enhancing retrieval tasks by providing interpretable similarity scores.
Quick Take
The study introduces HyperDimensional Computing (HDC) using Holographic Reduced Representations (HRR) for tabular data embeddings, enhancing retrieval tasks by providing interpretable similarity scores. HDC outperforms the graph-based baseline, EmbDI, in row retrieval across various configurations, achieving perfect attribute projection accuracy and enabling reliable zero-match detection.
Key Points
- HDC provides interpretable similarity scores for structured querying in tabular data.
- Outperforms EmbDI in row retrieval across all tested configurations.
- Achieves perfect attribute projection accuracy at sufficient dimensionality.
- Robustly handles non-equality predicates compared to existing methods.
- Enables reliable identification of zero-match predicates through principled thresholds.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 13871v1 Announce Type: new Abstract: Tabular data embeddings have become a cornerstone of data profiling and data integration pipelines, enabling tasks such as entity annotation and resolution; schema matching; column type detection; and table search, among others. Existing approaches embed rows, columns, or entire tables into a vector space and rely on nearest-neighbor search to retrieve candidate matches.
A fundamental limitation of current embedding methods is the lack of interpretable similarity scores: the concrete similarity value between a query and its nearest neighbour carries no intrinsic meaning, making it impossible to determine whether that neighbour is a true match or simply the least-dissimilar item in a corpus that contains no valid answer. This inability to set principled thresholds for retrieval undermines practical deployment, particularly for zero-match detection.
We investigate the use of HyperDimensional Computing (HDC), specifically the Holographic Reduced Representations (HRR) model, as a framework for tabular row embeddings when the retrieval task corresponds to answering structured select-project queries in vector space.
Exploiting the algebraic properties of HDC operations, we derive closed-form expected similarity values for both equality and non-equality retrieval predicates, which converge to interpretable values as dimensionality increases, and use these to identify suitable retrieval thresholds. We evaluate HDC against EmbDI, a graph-based baseline, on two real-world datasets across varying table sizes and predicate lengths.
Our results show that HDC matches or outperforms EmbDI for row retrieval across all configurations, handles non-equality predicates more robustly, and achieves perfect attribute projection accuracy at sufficient dimensionality -- while uniquely enabling reliable identification of zero-match predicates through its principled thresholds.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.