LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
Quick Answer
This paper shows that A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark.
Quick Take
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.
Key Points
- The LLM agent conditions proposals on full optimization history for better parameter navigation.
- Achieved a 15.3x throughput gain over UniIR in human-object interaction retrieval.
- Validation shows performance improves with parameter coupling: +33.3% on HICO-DET.
- Cross-system tests confirm top ranking across three datasets without modifications.
- Demonstrates effective optimization across different vector database management systems.
Article Content
From source RSS / original summaryarXiv:2606. 05489v1 Announce Type: new Abstract: Retrieval systems underpin modern AI applications -- spanning visual search, recommendation engines, and multi-modal question answering.
Modern multi-stage retrieval systems require the joint optimization of highly coupled parameters, yet traditional hyperparameter optimization (HPO) methods -- including Tree-structured Parzen Estimators (TPE) and Gaussian Process Bayesian Optimization -- rely on an independence assumption that fundamentally prevents them from navigating these coupled configuration spaces.
We address this limitation with a phase-aware large language model (LLM) agent that conditions each proposal on its full optimization history, navigating the coupled parameter space across phase-partitioned exploration, exploitation, and fine-tuning stages. Evaluated on the HICO-DET human-object interaction retrieval benchmark using Intel VDMS (Visual Data Management System), our agent outperforms Optuna TPE by +33. 3% and VDTuner by +34.
2% under SIEVE (Safeguarded Index Evaluation of Vector-search Efficiency, a quality-constrained throughput metric), delivering a 15. 3x throughput gain over UniIR. Validation across three benchmarks confirms that the agent's advantage grows with the degree of parameter coupling: +33. 3% on HICO-DET (high coupling), methods converge within 1% on GLDv2 (moderate coupling) and within 3. 6% on SIFT1M (near-independent control).
Cross-system validation on Milvus confirms the optimizer ranks first on all three datasets without modification, demonstrating transferability across vector database management system (VDBMS) platforms.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Biomazon: A Multimodal Dataset for 3D Forest Structure and Biomass Modeling in the Amazon Basin
Biomazon introduces a 20 m multimodal dataset for predicting 3D forest structure and biomass in the Amazon Basin, integrating GEDI RH profiles and AGBD with multi-sensor data. This benchmark facilitates machine learning evaluations of forest vertical structure and biomass modeling, establishing a reference for future research.