Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches
Quick Take
This review analyzes 139 studies on document classification via information fusion, revealing key trends and performance metrics.
Key Points
- Introduces a formal framework for document classification.
- Multimodal fusion significantly improves accuracy by +5.28%.
- Only 11.8% of multimodal studies validate findings statistically.
Article Content
From source RSS / original summaryarXiv:2605. 23910v1 Announce Type: new Abstract: Information fusion is used widely to improve document classification by the integration of multiple data sources (multimodal) or representations (multiview). However, the field lacks a unified framework, a quantitative synthesis of its effectiveness, and clear guidance for practitioners. This systematic review addresses these gaps by analysing 139 primary studies.
It introduces a formal framework to structure the field, presents the results of a qualitative analysis to identify key trends, and performs a random-effects meta-analysis (to our knowledge, the first focused on document classification) to quantify performance gains. Our meta-analysis reveals that multimodal fusion improves accuracy (mean gain of +5. 28 percentage points, $p=0. 0016$) significantly -- the F1-score effect is directionally positive but statistically non-significant in our primary model.
Multiview fusion provides consistent but modest gains for accuracy (+4. 67\%), F1-score (+3. 08\%), and recall (all $p<0. 05$). Critically, our qualitative synthesis uncovers challenges in reproducibility in methodological rigour: only 11. 8\% (multimodal) and 23. 3\% (multiview) of the studies use statistical tests to validate their findings, which undermines the reliability of many of their results.
This review's primary contributions are a unifying framework, the first quantitative evidence base, and data-driven guidelines. This review concludes that successful information fusion depends not on algorithmic complexity, but on the strategic alignment of the fusion method with the task context and a commitment to more rigorous validation.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?
The reliability of LLM judges for evaluating deep research agents is critically assessed using the REFLECT benchmark.