Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

arXiv cs.CL·Marcin Micha{\l} Miro\'nczuk

2h ago

·~2 min·5/26/2026·en·0

Quick Take

This review analyzes 139 studies on document classification via information fusion, revealing key trends and performance metrics.

Key Points

Introduces a formal framework for document classification.
Multimodal fusion significantly improves accuracy by +5.28%.
Only 11.8% of multimodal studies validate findings statistically.

Article Content

From source RSS / original summary

arXiv:2605. 23910v1 Announce Type: new Abstract: Information fusion is used widely to improve document classification by the integration of multiple data sources (multimodal) or representations (multiview). However, the field lacks a unified framework, a quantitative synthesis of its effectiveness, and clear guidance for practitioners. This systematic review addresses these gaps by analysing 139 primary studies.

It introduces a formal framework to structure the field, presents the results of a qualitative analysis to identify key trends, and performs a random-effects meta-analysis (to our knowledge, the first focused on document classification) to quantify performance gains. Our meta-analysis reveals that multimodal fusion improves accuracy (mean gain of +5. 28 percentage points, $p=0. 0016$) significantly -- the F1-score effect is directionally positive but statistically non-significant in our primary model.

Multiview fusion provides consistent but modest gains for accuracy (+4. 67\%), F1-score (+3. 08\%), and recall (all $p<0. 05$). Critically, our qualitative synthesis uncovers challenges in reproducibility in methodological rigour: only 11. 8\% (multimodal) and 23. 3\% (multiview) of the studies use statistical tests to validate their findings, which undermines the reliability of many of their results.

This review's primary contributions are a unifying framework, the first quantitative evidence base, and data-driven guidelines. This review concludes that successful information fusion depends not on algorithmic complexity, but on the strategic alignment of the fusion method with the task context and a commitment to more rigorous validation.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Extracting Training Data from Diffusion Language Models via Infilling

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution