FactoryLLM: A Safe and Open-Source AI Playground for Evaluating LLMs in Smart Factories
Quick Answer
FactoryLLM is an open-source AI platform for evaluating retrieval-augmented generation models in smart factories, achieving groundedness scores above 0.88 across three LLMs on 30 maintenance queries from 600 pages of documentation.
Quick Take
FactoryLLM is an open-source AI platform for evaluating models in smart factories, achieving groundedness scores above 0.88 across three LLMs on 30 maintenance queries from 600 pages of documentation. It ensures data safety by allowing local execution without sharing sensitive information.
Key Points
- FactoryLLM evaluates LLMs using RAGAS and NVIDIA's LLM-as-a-Judge metrics.
- Users can configure LLMs to analyze documents from multiple machines.
- The platform demonstrated effectiveness with a case study involving an Autonomous Intelligent Vehicle.
- All evaluated models achieved groundedness scores above 0.88.
- Full code and documentation are publicly available for community use.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 14119v1 Announce Type: new Abstract: Fault diagnostics and recovery in smart factories is challenging because critical information is dispersed across manuals of multiple machines which are interconnected through the manufacturing process. Large Language Models (LLMs) can provide a promising approach.
In this paper, we propose FactoryLLM, a safe and open-source AI playground designed for evaluating different LLM-based (RAG) models by analysing documents from multiple machines across the manufacturing process. FactoryLLM enables the user to configure the LLM, and assess performance when reasoning over multiple documents, through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics.
FactoryLLM is safe because it allows users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation. We demonstrate the efficacy of FactoryLLM through a case study which involves an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation.
The results suggest that FactoryLLM is effective in cross-machine document reasoning: every model achieved a groundedness score above 0. 88. The full code and documentation for community to test FactoryLLM with their manufacturing specific scenarios are publicly available.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.