How Can AI Find My Model? A Model-Finding Experimental Study Considering Data Formats, Embeddings, and Retrieval Strategies

arXiv cs.AI·Jhon G. Botello, Jose J. Padilla, Erika Frydenlund, Krzysztof Rechowicz, Eric Weisel

2h ago

·~1 min·7/1/2026·en·0

Quick Answer

This study explores how data representation, transformer-based embeddings, and retrieval strategies impact the discovery of simulation models through natural language queries.

Quick Take

This study explores how data representation, transformer-based embeddings, and retrieval strategies impact the discovery of simulation models through natural language queries. Results indicate that open-source embedding models perform well, and reranking methods are crucial as query complexity increases, providing a baseline for AI-driven model discovery.

Key Points

Data representation significantly affects model discovery performance.
Open-source embedding models achieve high performance in retrieval tasks.
Reranking methods are essential for complex queries.
The study uses recall@5 and nDCG@5 as evaluation metrics.
Findings contribute to AI-driven composability and interoperability.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Excerpt

From source RSS / original summary

arXiv:2606. 30846v1 Announce Type: new Abstract: Discovering simulation models for reuse remains a fundamental challenge in Modeling and Simulation (M&S). When many models coexist, identifying those that align with a given modeling intent remains difficult. Recent advances in Artificial Intelligence (AI), particularly retrieval-based approaches, offer a promising pathway to operate at this semantic layer.

In this paper, we present an experimental study investigating the impact of data representation, transformer-based embedding models, and retrieval strategies on the discovery of simulation models using natural language queries. We evaluated performance across multiple query types using standard information retrieval metrics, including recall@5 and nDCG@5.

Results show that data representation matters, open-source embedding models can achieve high performance, and reranking methods are important, especially as query complexity increases. This work provides a baseline for AI-driven model discovery and discusses its role in advancing toward AI-driven composability and interoperability.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Binghai Wang, Chenlong Zhang, Dayiheng Liu, Jiajun Zhang, Jiawei Chen, Mouxiang Chen, Rongyao Fang, Siyuan Zhang, Xuwu Wang, Yuheng Jing, Zeyao Ma, Zeyu Cui

5d ago

FeaturedOriginal

The Verification Horizon: No Silver Bullet for Coding Agent Rewards

AI Summary

As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.

#Agent #AI Coding #Inference #Policy