HierBias: Context-Conditioned Hierarchical Media Bias Detection with Multi-Task Type Classification
Quick Answer
HierBias is a hierarchical media bias detector that improves sentence-level classification by incorporating document context, achieving 0.853 F1 and 0.723 MCC on BABE and BASIL, outperforming existing models by 2.6% F1 and 4.3% MCC.
Quick Take
HierBias is a hierarchical media bias detector that improves sentence-level classification by incorporating document context, achieving 0.853 F1 and 0.723 MCC on BABE and BASIL, outperforming existing models by 2.6% F1 and 4.3% MCC. The model combines a RoBERTa encoder with a Transformer aggregator for enhanced bias detection.
Key Points
- HierBias uses context-conditioned bias probability for improved bias detection.
- The model combines sentence-level RoBERTa with a cross-sentence Transformer.
- Achieved 0.853 F1 and 0.723 MCC on benchmark datasets BABE and BASIL.
- Outperformed the state-of-the-art by 2.6% F1 and 4.3% MCC.
- Joint training of bias detection and type classification enhances sample efficiency.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Media bias detection is a critical task for ensuring fair and balanced information dissemination, yet existing sentence-level approaches classify each sentence independently, ignoring inter-sentence contextual signals that human annotators naturally exploit. We present \textbf{HierBias}, a hierarchical context-conditioned media bias detector that formally models document context in bias prediction. We introduce the \emph{context-conditioned bias probability} and prove theoretically that leveraging document context strictly reduces the Bayes error of sentence-level classification when inter-sentence mutual information is non-zero. A multi-task generalization bound further establishes that jointly training binary bias detection and fine-grained bias type classification improves sample efficiency on small annotated corpora. Architecturally, HierBias pairs a sentence-level RoBERTa encoder with a cross-sentence Transformer aggregator and dual output heads for binary detection and four-class type classification. Evaluated on BABE and BASIL, HierBias achieves 0.853 F1 and 0.723 MCC, surpassing the state-of-the-art bias-detector by $+2.6\%$ F1 and $+4.3\%$ MCC (McNemar's test, $p < 0.05$). Ablation experiments confirm that each theoretical component contributes independently and consistently.
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2606.26100 [cs.CL] |
| (or arXiv:2606.26100v1 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2606.26100 arXiv-issued DOI via DataCite |
Submission history
From: Kaining Li [view email]
[v1]
Wed, 29 Apr 2026 18:33:42 UTC (86 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.