RelGT-AC: A Relational Graph Transformer for Autocomplete Tasks in Relational Databases
Quick Take
RelGT-AC, a Relational Graph Transformer, enhances autocomplete tasks in relational databases by introducing a column masking strategy, a unified task head, and a TF-IDF encoder. It outperforms GraphSAGE on all regression tasks and achieves up to +10 AUROC points on text-heavy tasks across three RelBench v2 datasets.
Key Points
- Introduces column masking to avoid trivial solutions during subgraph encoding.
- Supports binary classification, multiclass classification, and regression in one model.
- TF-IDF encoder recovers strong lexical signals from free-text columns.
- Outperforms GraphSAGE on all regression autocomplete tasks.
- Achieves up to +10 AUROC points on text-heavy eligibility tasks.
Article Content
From source RSS / original summaryarXiv:2606. 03040v1 Announce Type: new Abstract: Relational databases underpin modern enterprise, scientific, and healthcare systems, yet predictive machine learning on such data remains challenging due to their multi-table, heterogeneous, and temporal structure. Relational Deep Learning (RDL) addresses this by representing databases as heterogeneous graphs and applying graph neural networks (GNNs) directly.
RelBench v2 recently introduced autocomplete tasks -- a practically motivated task type where the goal is to predict an existing column value from relational context, analogous to an intelligent form-filling assistant.
We propose RelGT-AC (Relational Graph Transformer for Autocomplete), extending the RelGT architecture with three targeted contributions: (1) a column masking strategy that prevents trivial solutions by masking the target column during subgraph encoding; (2) a unified task head supporting binary classification, multiclass classification, and regression autocomplete tasks within a single model; and (3) a TF-IDF text encoder that automatically detects and encodes free-text columns, recovering strong lexical signal that categorical encoders discard.
Across 7 tasks spanning 3 RelBench v2 datasets (rel-trial, rel-f1, rel-stack), RelGT-AC outperforms the GraphSAGE baseline on all 3 regression autocomplete tasks and achieves up to +10 AUROC points on text-heavy eligibility tasks via the TF-IDF encoder.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification
AuditFlow introduces a multi-agent framework for structured financial reporting verification, achieving 82.09% accuracy with GPT-5.5, outperforming the baseline by 14.93 points. It utilizes a symbolic environment for effective audit processes, demonstrating the necessity of deterministic checks for reliable verification.