Anchor-Conditioned Compositional Control for Landscape Image Generation
Quick Answer
This paper introduces an anchor-conditioned finetuning framework for landscape image generation, achieving a horizon detection rate of 0.850 and rule of thirds alignment of 0.817.
Quick Take
This paper introduces an anchor-conditioned finetuning framework for landscape image generation, achieving a horizon detection rate of 0.850 and rule of thirds alignment of 0.817. The framework utilizes a four-dimensional anchor vector and a diffusion model, demonstrating that compositional control precision varies significantly with scene category, reducing horizon deviation by up to 40% with homogeneous training subsets.
Key Points
- Achieves a horizon detection rate of 0.850 in landscape image generation.
- Utilizes a four-dimensional compositional anchor vector for improved control.
- Demonstrates a 40% reduction in horizon deviation with homogeneous training subsets.
- Employs a decoupled cross attention mechanism with Fourier encoding.
- Shows that compositional control precision is dependent on scene category.
Article Excerpt
From source RSS / original summaryarXiv:2606. 07638v1 Announce Type: new Abstract: Image generative models, though widely used as creative tools, offer limited support for the kind of compositional control that photographers and visual artists routinely exercise.
This paper presents early results on an anchor conditioned finetuning framework for landscape image generation, in which a four dimensional compositional anchor vector is extracted from training images and injected into a diffusion model via a decoupled cross attention mechanism with Fourier encoding and three way classifier free guidance dropout. Quantitative evaluation against a baseline and three ablation variants shows that the proposed architecture achieves the highest horizon detection rate of 0.
850 and the highest rule of thirds alignment of 0. 817. A category specific ablation further demonstrates that training on compositionally homogeneous scene subsets reduces horizon deviation by up to 40 percent compared to mixed training. This establishes that compositional control precision is category dependent.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval
A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.