Exposing the Unsaid: Visualizing Hidden LLM Bias through… | AI Deep Signal

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

arXiv cs.CL·Matteo Pelossi, Rita Sevastjanova, Thilo Spinner, Mennatallah El-Assady

6/19/2026

·~2 min·6/19/2026·en·0

Quick Answer

The paper introduces TreeTracer, a visual analytics tool that reveals hidden biases in Large Language Models (LLMs) like GPT-2 XL through stochastic path aggregation.

Quick Take

By comparing semantic contexts in a hierarchical structure, it exposes representational harms and aids analysts in detecting systemic biases effectively.

Key Points

TreeTracer aggregates hundreds of stochastic generations for bias evaluation.
The tool visualizes results using a custom Sankey diagram for clear comparisons.
Case studies show hidden harms like pronoun suppression in GPT-2 XL.
User studies confirm reduced cognitive load for analysts using the tool.
Contrastive inference helps mitigate misinterpretation of bias presence.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

(LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static automated metrics. These approaches obscure the underlying probability distributions and fail to capture biases hidden in lower-probability generation branches. This paper introduces TreeTracer, a visual analytics tool designed to evaluate LLM bias through aggregated compar

Read the full article on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Isabel Xu (The Overlake School), Cynthia Xu (The Overlake School), Rachel Ren (Edwards Vacuum Inc.), Cong Guo (The University of Memphis), Jiacheng Ding (The University of Memphis)

1w ago

FeaturedOriginal

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

AI Summary

TriAgent introduces a cost-efficient multi-agent system for financial sentiment analysis, combining VADER, FinBERT, and Qwen2.5. It achieves an F1 score of ~0.87 with significant savings of $9.3M/year at a 10M-user scale compared to GPT-4o-mini, while also detecting hallucinations with an AUC of 0.90.

#LLM #Agent #AI Startup #Enterprise AI

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis