Identifying Interactions at Scale for LLMs

3/13/2026

·~3 min·3/13/2026·en·1

Quick Answer

Quick Take

Berkeley AI Research introduces SPEX and ProxySPEX, algorithms designed to identify influential interactions in Large Language Models (LLMs) at scale, leveraging ablation techniques and signal processing. These methods address the complexity of model behavior by focusing on a small subset of influential interactions, making interpretability more feasible.

Key Points

SPEX uses ablation techniques to measure the influence of model components effectively.
The framework identifies influential interactions, reducing the complexity of analysis.
ProxySPEX builds on SPEX to explore hierarchical structures in model interactions.
Efficient decoding algorithms help isolate specific interactions from combined signals.
The approach addresses the exponential growth of potential interactions in LLMs.

Paper Resources

Read Paperbair.berkeley.edu

Article Content

From source RSS / original summary

--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process more transparent to model builders and impacted humans, a step toward safer and more trustworthy AI.

To gain a comprehensive understanding, we can analyze these systems through different lenses: feature attribution, which isolates the specific input features driving a prediction (Lundberg & Lee, 2017; Ribeiro et al. , 2022); data attribution, which links model behaviors to influential training examples (Koh & Liang, 2017; Ilyas et al. , 2022); and mechanistic interpretability, which dissects the functions of internal components (Conmy et al. , 2023; Sharkey et al. , 2025).

Across these perspectives, the same fundamental hurdle persists: complexity at scale. Model behavior is rarely the result of isolated components; rather, it emerges from complex dependencies and patterns. To achieve state-of-the-art performance, models synthesize complex feature relationships, find shared patterns from diverse training examples, and process information through highly interconnected internal components.

Therefore, grounded or reality-checked interpretability methods must also be able to capture these influential interactions. As the number of features, training data points, and model components grow, the number of potential interactions grows exponentially, making exhaustive analysis computationally infeasible. In this blog post, we describe the fundamental ideas behind SPEX and ProxySPEX, algorithms capable of identifying these critical interactions at scale.

Attribution through Ablation Central to our approach is the concept of ablation, measuring influence by observing what changes when a component is removed. Feature Attribution: We mask or remove specific segments of the input prompt and measure the resulting shift in the predictions. Data Attribution: We train models on different subsets of the training set, assessing how the model’s output on a test point shifts in the absence of specific training data.

Model Component Attribution (Mechanistic Interpretability): We intervene on the model’s forward pass by removing the influence of specific internal components, determining which internal structures are responsible for the model’s prediction. In each case, the goal is the same: to isolate the drivers of a decision by systematically perturbing the system, in hopes of discovering influential interactions.

Since each ablation incurs a significant cost, whether through expensive inference calls or retrainings, we aim to compute attributions with the fewest possible ablations. --> SPEX and ProxySPEX Framework To discover influential interactions with a tractable number of ablations, we have developed SPEX (Spectral Explainer). This framework draws on signal processing and coding theory to advance interaction discovery to scales orders of magnitude greater than prior methods.

SPEX circumvents this by exploiting a key structural observation: while the number of total interactions is prohibitively large, the number of influential interactions is actually quite small. We formalize this through two observations: sparsity (relatively few interactions truly drive the output) and low-degreeness (influential interactions typically involve only a small subset of features). These properties allow us to reframe the difficult search problem into a solvable sparse recovery problem.

Drawing on powerful tools from signal processing and coding theory, SPEX uses strategically selected ablations to combine many candidate interactions together. Then, using efficient decoding algorithms, we disentangle these combined signals to isolate the specific interactions responsible for the model’s behavior. --> In a subsequent algorithm, ProxySPEX, we identified another structural property common in complex machine learning models: hierarchy. …

Read on bair.berkeley.edu

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free