OralAgent: Integrating Reasoning, Tools, and Knowledge for Interactive Dental Image Analysis
Quick Take
OralAgent is a pioneering dental AI agent that integrates multimodal reasoning and tool-based decision-making, achieving state-of-the-art results on MMOral-Uni, MMOral-OPG, and OralQA-ZH benchmarks. It utilizes 22 visual analysis tools and 368 dental textbooks, enhancing clinical workflows in oral healthcare.
Key Points
- OralAgent integrates 22 visual analysis tools for comprehensive dental image analysis.
- It leverages 368 classical dental textbooks for knowledge-grounded retrieval.
- Achieves state-of-the-art performance on multiple dental benchmarks.
- Introduces OralCorpus, a bilingual resource with 134.8M tokens for RAG.
- Constructed OralQA-ZH, a benchmark with 798 items across eleven subspecialties.
Article Content
From source RSS / original summaryarXiv:2605. 27378v1 Announce Type: new Abstract: Dental image analysis plays a pivotal role in supporting accurate diagnosis and treatment planning in oral healthcare. Although recent advances have produced dental AI models for specific tasks and individual imaging modalities, their isolated designs limit practical use in real-world clinical workflows.
In this paper, we present OralAgent, the first dental-specialized AI agent that unifies multimodal reasoning, tool-based decision-making, and knowledge-grounded retrieval within an end-to-end automated framework. It integrates 22 visual analysis tools and 368 widely-used classical dental textbooks, enabling autonomous reasoning, planning, tool use, knowledge retrieval, and multi-step workflow execution. Furthermore, we introduce OralCorpus, a large-scale, high-quality bilingual textual resource containing 134.
8M tokens curated for dental retrieval-augmented generation (RAG). To evaluate models' multidisciplinary dental knowledge, we construct OralQA-ZH, a Chinese multiple-choice question benchmark consisting of 798 items across eleven oral subspecialties. Extensive experiments demonstrate that OralAgent achieves state-of-the-art performance on the MMOral-Uni, MMOral-OPG, and OralQA-ZH benchmarks, highlighting its effectiveness, interpretability, and adaptability in real-world clinical settings.
The code and models are publicly available at https://github. com/isjinghao/OralAgent.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.