Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates
Quick Take
The paper presents a conditional hypothesis generation framework that integrates researcher-specified covariates to enhance LLM-based text analysis, addressing stratum imbalance and sign reversal issues. Synthetic experiments demonstrate improved performance over global baselines, while expert evaluations confirm the generation of more relevant hypotheses within targeted subgroups.
Key Points
- Introduces conditional hypothesis generation for LLM-based text analysis.
- Addresses stratum imbalance and sign reversal in hypothesis discovery.
- Outperforms global baselines in synthetic experiments.
- Expert evaluations confirm more useful hypotheses in relevant subgroups.
- Incorporates researcher-specified covariates for targeted analysis.
Article Content
From source RSS / original summaryarXiv:2606. 03029v1 Announce Type: new Abstract: A core goal of computational social science is to discover interpretable differences in how language varies across outcomes of interest, such as political affiliation or instructional quality. Recent LLM-based hypothesis generation methods describe such differences in natural language, but select for globally discriminative patterns without accounting for covariates that shape the data based on researchers' domain knowledge.
When covariates are ignored, selected patterns can reflect confounds rather than differences of substantive interest. We introduce conditional hypothesis generation, a framework that incorporates researcher-specified covariates to steer hypothesis discovery toward differences that hold within relevant subgroups. Two challenges arise: the target subgroup may be underrepresented (stratum imbalance), and the direction of a difference may reverse across subgroups (sign reversal).
We propose two econometrics-inspired methods: one introduces feature--covariate interactions to detect sign reversals, and the other applies within-stratum demeaning and inverse-frequency reweighting to equalize underrepresented strata. Synthetic experiments show each method outperforms global baselines in its targeted setting, and expert evaluation on two real-world datasets confirms that covariate-aware generation surfaces more useful hypotheses within relevant subgroups.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.