Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
Quick Take
Lens is a compact T2I model achieving high performance with significantly less training compute.
Key Points
- 3.8B parameters, competitive with 6B+ models.
- 19.3% of Z-Image's training compute required.
- Generates 1024^2 images in 3.15 seconds.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning
GeoSym127K introduces a scalable neuro-symbolic framework for enhanced geometric reasoning in multimodal models.