LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

arXiv cs.AI·Tom Lucas, Alessio Buscemi, Alfredo Capozucca, German Castignani, Barbara Delacroix

6/1/2026

·~2 min·6/1/2026·en·3

Quick Answer

LLM-FACETS is an open-source framework designed to evaluate Large Language Models (LLMs) for transparency and accountability, accessible via a browser interface.

Quick Take

LLM-FACETS is an open-source framework designed to evaluate Large Language Models (LLMs) for transparency and accountability, accessible via a browser interface. It supports three user profiles—technical experts, domain experts, and compliance officers—while ensuring data privacy through self-hosted metrics and explicit API interactions. The framework enhances reproducibility and accountability in AI by allowing integration of new metrics without altering the evaluation pipeline.

Key Points

Framework designed for non-technical practitioners to audit LLM outputs effectively.
Utilizes deterministic metrics like BLEU and ROUGE without external data transmission.
Incorporates token-level visualizations for epistemic uncertainty and multi-judge consensus.
Open-source implementation allows integration of new metrics seamlessly.
Cross-validation of 18 metrics ensures reproducibility against reference libraries.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 31167v1 Announce Type: new Abstract: Assessing whether Large Language Models outputs are factually grounded, epistemically calibrated, and methodologically reproducible is a prerequisite for responsible AI deployment.

Yet auditing LLMs remains inaccessible to non-technical practitioners: existing tools require programming expertise and non-trivial environment setup, and cloud-hosted platforms transmit evaluation data to external services, creating barriers for domain experts and compliance officers legally responsible for AI oversight.

We introduce LLM-FACETS (LLM FActuality Cross-EvaluaTion System): an open-source framework with a browser-accessible interface and a plugin architecture, structured around three practitioner profiles (technical experts, domain experts, compliance officers) that mirror the stakeholder categories identified in the EU AI Act and the NIST AI Risk Management Framework.

The architecture makes data flows explicit: deterministic metrics (BLEU, ROUGE, BERTScore) run entirely within the self-hosted server with no outbound transmission; LLM-judge metrics contact external APIs explicitly, with users retaining full credential control.

The framework operationalizes transparency through three mechanisms: token-level log-probability visualization for epistemic uncertainty, multi-judge consensus to mitigate judge bias, and Triad metrics (Faithfulness, Answer Relevance, Context Relevance) to detect and localize hallucinations. A plugin architecture allows any new metric or dataset to be integrated without modifying the evaluation pipeline.

The open-source implementation enables cross-checking across multiple metrics targeting the same property, ensuring reproducibility and decoupling AI accountability from the teams building the systems assessed. We verify the framework through cross-validation of 18 metric implementations against canonical reference libraries.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·David Krongauz, Arad Zulti, Eran Segal, Teddy Lazebnik

6h ago

FeaturedOriginal

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

AI Summary

The MEDA system utilizes large language models and symbolic regression to autonomously discover ordinary differential equations for biological systems, achieving strong structural recovery and biologically plausible models. It outperforms existing methods by integrating domain knowledge and mechanistic constraints, demonstrating effective retrieval and extrapolation capabilities.

#LLM #Agent #Inference #AI Startup