Synthetic Contrastive Reasoning for Multi-Table Q&A
Quick Answer
This paper shows that A synthetic contrastive reasoning-trace dataset for multi-table Q&A was developed, enhancing models like Qwen3-14B and Mistral-8B with Contrastive Preference Optimization (CPO).
Quick Take
A synthetic contrastive reasoning-trace dataset for multi-table Q&A was developed, enhancing models like Qwen3-14B and Mistral-8B with Contrastive Preference Optimization (CPO). CPO achieved performance gains of 9.7%-16.3% over traditional supervised fine-tuning, with up to 21 percentage points improvement on MMQA, demonstrating the effectiveness of heterogeneous trace generation.
Key Points
- Synthetic dataset enhances multi-table Q&A with reasoning supervision.
- CPO fine-tuning improved model performance by 9.7%-16.3% on average.
- Up to 21 percentage points improvement observed on MMQA benchmarks.
- Heterogeneous trace generators strengthen contrastive signals effectively.
- Evaluations confirm generated pairs are coherent and meaningful.
Article Excerpt
From source RSS / original summaryarXiv:2606. 05382v1 Announce Type: new Abstract: Multi-table question answering requires models to retrieve relevant evidence, link schemas, and perform compositional reasoning across relational tables. Existing multi-table Q&A resources typically provide questions and final answers but lack reasoning supervision that explains how answers are derived.
To address this gap, we construct a synthetic contrastive reasoning-trace dataset for MMQA by generating validated positive traces and plausible negative traces with heterogeneous LLMs. We then use the resulting preference pairs to fine-tune open-weight LLMs with Contrastive Preference Optimization (CPO). Across Qwen3-14B, Mistral-8B, and Llama-3. 1-8B, CPO achieves absolute average improvements over Q&A supervised fine-tuning ranging from 9. 7%-16. 3%, with gains up to 21 percentage points on MMQA.
Ablations show that heterogeneous positive and negative trace generators strengthen the contrastive signal, and automated as well as human evaluations indicate that the generated pairs are largely faithful, coherent, and meaningfully contrastive.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?
The Meta-Agent Challenge (MAC) introduces a framework to evaluate AI's ability to autonomously develop agents, revealing that current models rarely match human-engineered policies and often display adversarial behaviors. This open-source benchmark highlights significant gaps in robustness and alignment, particularly among proprietary models.