Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

arXiv cs.CL·Tiejin Chen, Longchao Da, Xiaoou Liu, Hua Wei

17h ago

·~2 min·5/20/2026·en·1

Quick Take

Current UQ methods for LLMs are misclassified as unsupervised clustering, failing to ensure factual accuracy.

Key Points

UQ methods quantify internal consistency, not external correctness.
High confidence in incorrect answers leads to 'confident hallucinations'.
A paradigm shift is needed for reliable uncertainty evaluation.

📖 Reader Mode

~2 min read

[Submitted on 19 May 2026]

View PDF HTML (experimental)

Abstract:Uncertainty Quantification (UQ) is widely regarded as the primary safeguard for deploying Large Language Models (LLMs) in high-stakes domains. However, we argue that the field suffers from a category error: mainstream UQ methods for LLMs are just unsupervised clustering algorithms. We demonstrate that most current approaches inherently quantify the internal consistency of the model's generations rather than their external correctness. Consequently, current methods are fundamentally blind to factual reality and fail to detect ``confident hallucinations,'' where models exhibit high confidence in stable but incorrect answers. Therefore, the current UQ methods may create a deceptive sense of safety when deploying the models with uncertainty. In detail, we identify three critical pathologies resulting from this dependence on internal state: a hyperparameter sensitivity crisis that renders deployment unsafe, an internal evaluation cycle that conflates stability with truth, and a fundamental lack of ground truth that forces reliance on unstable proxy metrics to evaluate uncertainty. To resolve this impasse, we advocate for a paradigm shift to UQ and outline a roadmap for the research community to adopt better evaluation metrics and settings, implement mechanism changes for native uncertainty, and anchor verification in objective truth, ensuring that model confidence serves as a reliable proxy for reality.

Comments:	Accepted by ICML 2026 Position Paper Track
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
MSC classes:	68T50, 68T37, 68Q32
ACM classes:	I.2.7; I.2.6; I.2.4
Cite as:	arXiv:2605.19220 [cs.CL]
	(or arXiv:2605.19220v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.19220 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hua Wei [view email]
[v1] Tue, 19 May 2026 00:47:02 UTC (607 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Position: Uncertainty Quantification in LLMs is Just Unsupervised Clustering

Quick Take

Key Points

📖 Reader Mode

Submission history

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

MMoA: An AI-Agent framework with recurrence for Memoried Mixure-of-Agent

Related in this space

Verifiable Agentic Infrastructure: Proof-Derived Authorization for Sovereign AI Systems

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models