Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation
Quick Answer
The paper introduces TreeTracer, a visual analytics tool that reveals hidden biases in Large Language Models (LLMs) like GPT-2 XL through stochastic path aggregation.
Quick Take
The paper introduces TreeTracer, a visual analytics tool that reveals hidden biases in Large Language Models (LLMs) like GPT-2 XL through stochastic path aggregation. By comparing semantic contexts in a hierarchical structure, it exposes representational harms and aids analysts in detecting systemic biases effectively.
Key Points
- TreeTracer aggregates hundreds of stochastic generations for bias evaluation.
- The tool visualizes results using a custom Sankey diagram for clear comparisons.
- Case studies show hidden harms like pronoun suppression in GPT-2 XL.
- User studies confirm reduced cognitive load for analysts using the tool.
- Contrastive inference helps mitigate misinterpretation of bias presence.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 19344v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit representational and syntactic biases that are difficult to evaluate due to the stochastic nature of text generation. Standard auditing methods rely on a single output inspection or static automated metrics. These approaches obscure the underlying probability distributions and fail to capture biases hidden in lower-probability generation branches.
This paper introduces TreeTracer, a visual analytics tool designed to evaluate LLM bias through aggregated comparison. Using a systematic perturbation analysis pipeline, the tool replaces ontology-defined terms in each input prompt, aggregates hundreds of stochastic generations into a syntax-aligned hierarchical structure, and then performs classification-aware node merging with an auxiliary language model. The resulting structure is visualized through a custom Sankey diagram.
By juxtaposing two ontology-driven trees, the workspace enables direct comparison between semantic contexts and supports systematic bias detection. Because any visualization reflects only a subset of the model's learned behavior, the system further applies contrastive inference to compute and directly display counterfactual token probabilities across contexts, reducing the risk of misinterpreting the presence of bias.
We validate the workspace through case studies comparing an unaligned baseline model GPT-2 XL against the constitutionally aligned Apertus models. The visual aggregation successfully exposes hidden representational harms, such as counterfactual pronoun suppression and conversational marginalization of individuals. A preliminary user study confirms that the aggregated comparative interface reduces cognitive load and effectively supports analysts in detecting systemic biases.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.