Brain-to-Image Retrieval and Reconstruction via Multimodal EEG Alignment

arXiv cs.CV·Chi Kit Wong, Yan Liu, Haowen Yan

4d ago

·~1 min·5/26/2026·en·1

Quick Take

The brain-to-image system decodes visual stimuli from EEG signals, achieving 86.30% Top-1 accuracy in image retrieval and a CLIP score of 0.903 in image reconstruction using multi-modal alignment techniques. This demonstrates the potential for advanced EEG applications in visual representation decoding.

Key Points

EEG-to-image retrieval ranks stimuli among 200 candidates with 86.30% Top-1 accuracy.
The retrieval model uses biologically inspired EVNet features and InfoNCE loss.
CognitionCapturerPro aligns EEG representations with multi-modal CLIP embeddings.
Image reconstruction achieved a CLIP score of 0.903 using ViT-H-14.
Results indicate the feasibility of decoding visual representations from EEG signals.

Article Content

From source RSS / original summary

arXiv:2605. 23996v1 Announce Type: new Abstract: We present a brain-to-image system that decodes visual stimuli from EEG signals recorded during natural image viewing. Our system addresses two tasks: (1) EEG-to-image retrieval, which ranks the correct stimulus image among 200 candidates given an EEG segment, and (2) EEG-to-image reconstruction, which generates an image consistent with the perceived stimulus.

For retrieval, we implement a multi-level blurring approach improved with biologically inspired EVNet features and trained with the InfoNCE loss. Evaluated over 10 random seeds for a single subject, the retrieval model achieves a mean final-epoch Top-1 accuracy of 86. 30% and Top-5 accuracy of 98. 55%.

For reconstruction, we implement CognitionCapturerPro, which aligns EEG representations to multi-modal CLIP embeddings, including image, text, depth, and edge embeddings, and synthesizes images with SDXL-Turbo conditioned via IP-Adapter. Averaged over 10 seeds, the reconstruction model achieves a CLIP score of 0. 903 using ViT-H-14, a CLIP score of 0. 870 using ViT-L/14, and an SSIM of 0. 409.

These results demonstrate the feasibility of decoding rich visual representations from EEG signals using modern multi-modal alignment and generative modeling techniques.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Taha Koleilat, Hassan Rivaz, Yiming Xiao

3d ago

FeaturedOriginal

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

AI Summary

Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, achieving 0.11% parameter updates while enhancing uncertainty-aware fine-tuning. It outperforms state-of-the-art methods across 15 biomedical imaging datasets, proving effective in few-shot learning and domain shifts for clinical applications.

#AI Coding #Inference #Open Source

Brain-to-Image Retrieval and Reconstruction via Multimodal EEG Alignment

Quick Take

Key Points

Article Content

Want this in your inbox every morning?

More from arXiv cs.CV

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Related in this space

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities