Comparing BERT Sentence-Pair Classification and Few-Shot LLM Prompting for Detecting Threat and Solution Framing in German Climate News
Quick Answer
This study compares few-shot prompting with Llama 4 Maverick and fine-tuned BERT (deepset/gbert-large) for classifying German climate news as threat or solution-oriented.
Quick Take
This study compares few-shot prompting with Llama 4 Maverick and fine-tuned BERT (deepset/gbert-large) for classifying German climate news as threat or solution-oriented. BERT achieved an F1 score of 0.83, outperforming the LLM's 0.78, highlighting the effectiveness of contextual sentence input in classification tasks.
Key Points
- BERT classifiers achieved an F1 score of 0.83 for threat and solution tasks.
- Llama 4 Maverick's few-shot prompting reached an F1 score of 0.78.
- The study analyzed 440 Austrian newspaper articles for framing patterns.
- Providing context from preceding sentences significantly improved BERT's performance.
- The research contributes to comparing encoder models with generative models in social science.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 26489v1 Announce Type: new Abstract: News media play a central role in shaping public perceptions of climate change, and whether coverage emphasizes threats or solutions has measurable effects on audience engagement and policy support. Automated detection of these framing patterns at the sentence level would allow researchers to analyze large corpora that are infeasible to code manually.
We present a systematic comparison of two approaches for classifying sentences from German-language climate news articles as threat-oriented, solution-oriented, both, or neither. The first approach uses few-shot prompting with an open-weights large language model (Llama 4 Maverick), employing chain-of-thought reasoning and structured output with confidence scoring.
The second approach fine-tunes a German BERT model (deepset/gbert-large) for sentence-pair classification, where the preceding sentence provides contextual information for the target sentence. Both approaches implement two independent binary classifiers, one for threat framing and one for solution framing. We evaluate both methods on a corpus of 440 Austrian newspaper articles that were manually coded following a detailed coding scheme developed with domain experts.
The fine-tuned BERT classifiers achieve an F1 score of 0. 83 for both the threat and solution tasks, while the LLM-based classifiers reach an F1 of 0. 78. An ablation study confirms that providing the preceding sentence as context improves BERT classification performance substantially compared to single-sentence input. These results contribute to the growing body of work comparing fine-tuned encoder models with prompted generative models for text classification in computational social science.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.