Toward 360-Degree Indoor Panorama Editing via Tuning-Free Diffusion Model with Refocusing Cross-Attention

arXiv cs.CV·Dinh-Khoi Vo, Nhut-Thanh Le-Hinh, Viet-Tham Huynh, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

6h ago

·~2 min·6/15/2026·en·0

Quick Answer

FocusDiff introduces a tuning-free diffusion model for precise indoor panorama editing, overcoming challenges like prompt brittleness and spillover edits.

Quick Take

FocusDiff introduces a tuning-free diffusion model for precise indoor panorama editing, overcoming challenges like prompt brittleness and spillover edits. It outperforms existing zero-shot editors on the LIMB benchmark, achieving superior precision and photorealism in 360-degree environments.

Key Points

FocusDiff utilizes refocusing cross-attention for targeted image manipulation.
Achieves superior text-image alignment and background preservation in edits.
Demonstrated effectiveness in virtual reality environments.
Extensive experiments on 30 multi-object images validate its performance.
Outperforms existing models in precision and usability.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2606. 14035v1 Announce Type: new Abstract: Zero-shot text-guided diffusion has significantly advanced image editing; however, its practical usability remains constrained by three persistent challenges: prompt brittleness that requires meticulous prompt engineering, spillover edits that unintentionally affect non-target regions, and failures on small or cluttered objects caused by limited fine-grained supervision in training data.

We propose FocusDiff (Target-Aware Refocusing for Tuning-Free Diffusion Editing), a tuning-free framework for precise and region-specific image manipulation based on refocusing cross-attention. Given a target region obtained through automated segmentation or manual selection, FocusDiff applies selective blurring to non-edit areas to guide attention toward the masked region while accurately transferring the object's identity, structure, and appearance to the edited output.

Integrated context-preserving modules further ensure background fidelity and global coherence, enabling accurate edits from simple text prompts in a single pass. We also extend FocusDiff to 360-degree indoor panorama editing and demonstrate its effectiveness within virtual reality environments.

Extensive experiments on our localized editing benchmark LIMB, comprising 30 multi-object images and 100 annotated examples including challenging small-object cases, show that FocusDiff outperforms existing zero-shot editors in text-image alignment and background preservation, achieving superior precision, photorealism, and usability. The project page is available at https://vdkhoi20. github. io/FocusDiff.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

1w ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup