Anchor-Conditioned Compositional Control for Landscape Image Generation

arXiv cs.CV·Gadha Lekshmi P, Govind Arun, Rohith Syam, Ahmed Elgammal

2h ago

·~1 min·6/9/2026·en·0

Quick Answer

This paper introduces an anchor-conditioned finetuning framework for landscape image generation, achieving a horizon detection rate of 0.850 and rule of thirds alignment of 0.817.

Quick Take

This paper introduces an anchor-conditioned finetuning framework for landscape image generation, achieving a horizon detection rate of 0.850 and rule of thirds alignment of 0.817. The framework utilizes a four-dimensional anchor vector and a diffusion model, demonstrating that compositional control precision varies significantly with scene category, reducing horizon deviation by up to 40% with homogeneous training subsets.

Key Points

Achieves a horizon detection rate of 0.850 in landscape image generation.
Utilizes a four-dimensional compositional anchor vector for improved control.
Demonstrates a 40% reduction in horizon deviation with homogeneous training subsets.
Employs a decoupled cross attention mechanism with Fourier encoding.
Shows that compositional control precision is dependent on scene category.

Article Excerpt

From source RSS / original summary

arXiv:2606. 07638v1 Announce Type: new Abstract: Image generative models, though widely used as creative tools, offer limited support for the kind of compositional control that photographers and visual artists routinely exercise.

This paper presents early results on an anchor conditioned finetuning framework for landscape image generation, in which a four dimensional compositional anchor vector is extracted from training images and injected into a diffusion model via a decoupled cross attention mechanism with Fourier encoding and three way classifier free guidance dropout. Quantitative evaluation against a baseline and three ablation variants shows that the proposed architecture achieves the highest horizon detection rate of 0.

850 and the highest rule of thirds alignment of 0. 817. A category specific ablation further demonstrates that training on compositionally homogeneous scene subsets reduces horizon deviation by up to 40 percent compared to mixed training. This establishes that compositional control precision is category dependent.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CV

See more →

arXiv cs.CV·Shahrzad Esmat, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain, Ali Jannesari

4d ago

FeaturedOriginal

LLM-Guided ANN Index Optimization for Human-Object Interaction Retrieval

AI Summary

A phase-aware LLM agent optimizes human-object interaction retrieval, outperforming Optuna TPE by 33.3% and VDTuner by 34.2% on the HICO-DET benchmark. This method enhances throughput by 15.3x over UniIR and demonstrates strong transferability across vector database management systems.

#LLM #Agent #Inference #AI Startup