TIGER: Traceable Inference with Graph-Based Evidence Routing for Mitigating Hallucinations in Multimodal Generation

arXiv cs.AI·Kaixiang Zhao, Tianrun Yu, Shawn Huang, Porter Jenkins, Yushun Dong, Amanda Hughes

3h ago

·~1 min·6/2/2026·en·0

Quick Take

TIGER is a novel inference-time framework designed to mitigate hallucinations in multimodal generation by utilizing graph-based evidence routing. It independently assesses claims against input data, reducing unsupported content while maintaining task quality across four cross-modal paths, including image-to-text and audio-to-text. The method shows a geometric decrease in expected total risk and is effective across multiple model backbones.

Key Points

TIGER redesigns feedback for localized repair in multimodal generation.
It uses observation and claim graphs to assign risk scores to claims.
The framework reduces unsupported claims while preserving output quality.
Experiments show effectiveness across image-to-text and audio-to-text paths.
Convergence analysis indicates a geometric decrease in expected total risk.

Article Content

From source RSS / original summary

arXiv:2606. 00232v1 Announce Type: new Abstract: We study fact-level repair for multimodal generation, where a fluent output may contain specific facts that are not supported by the input. Existing inference-time repair methods often generate feedback by jointly conditioning on the input and the current output. This design has two limitations: hallucinated claims in the output can bias the model's interpretation of the input, and free-form feedback cannot be ranked or scheduled at the fact level.

We present TIGER, an inference-time framework that redesigns feedback for localized repair. TIGER independently extracts an observation graph from the input and a claim graph from the current output, then assigns each claim a graph-conditioned risk score based on support and conflict. The model repairs selected high-risk claims while keeping the backbone frozen. We provide a convergence analysis showing that the expected total risk decreases geometrically to an explicit asymptotic bound under mild assumptions.

Experiments across four cross-modal paths, including image-to-text, image+text-to-text, audio-to-text, and video-to-text, show that TIGER reduces unsupported content while preserving task quality. The gains hold across multiple backbones, and a CrisisFACTS case study suggests that the same repair mechanism can improve grounding in multi-source settings.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

3h ago

FeaturedOriginal

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

AI Summary

The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.

#LLM #Agent #Inference #AI Startup