Knowledge Graph-Enhanced Zero-Shot Topic Classification: A Multi-Strategy Comparative Study
Quick Take
This study presents a zero-shot multi-label topic classification framework enhanced with knowledge graphs, revealing that keyword-enhanced classification outperforms others, with six out of fifteen LLMs exceeding baseline performance. However, graph augmentation negatively impacts larger models, while self-consistency decoding does not improve performance despite increasing computational costs fivefold.
Key Points
- Framework includes four variants: article-only, keyword-enhanced, and two self-consistency decoding methods.
- Keyword-enhanced classification (AK) is the top performer among the base methods.
- Graph augmentation shows mixed effects, benefiting smaller models but hindering larger ones.
- Self-consistency decoding variant increases computation costs without improving performance.
- Tested on fifteen LLMs and eight multi-label datasets across various domains.
Article Content
From source RSS / original summaryarXiv:2605. 30465v1 Announce Type: new Abstract: Multi-label topic classification without labeled training data is a challenging task, specially when documents contain complex relational information. We present a zero-shot multi-label topic classification framework and systematically investigate how per-article knowledge graph augmentation affects its performance.
The base framework classifies topics in documents without labeled training data and has four variants: article-only classification, keyword-enhanced classification, and self-consistency decoding variants of both. Then, we augment each base variant with per article knowledge graph. This graph is extracted from the input document through a pipeline similar to KGGen based on subject-predicate-object triples.
We test all eight methods, four base and four graph augmented on fifteen LLMs and eight multi-label datasets across different domains. For the base framework, keyword-enhanced classification (AK) is the best performing method, and six out of fifteen LLMs surpass the sentence-encoder baseline. Graph augmentation has positive and negative impacts on small and large models, respectively. This shows that larger models already contain enough relational information from pretraining.
Furthermore, the self-consistency decoding variant does not show performance improvements in any experiment while increasing computation costs about fivefold.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The REFLECT benchmark reveals that current LLM judges are unreliable, achieving below 55% accuracy in evaluating reasoning and evidence use, highlighting the need for improved evaluation methods for deep research agents.