Stable-Video-3D: Text-to-video at 1080p with consistent physics · DeepSignalStable-Video-3D: Text-to-video at 1080p with consistent physics
Stable-Video-3D generates 8s 1080p text-to-video with physically plausible motion via a learned dynamics prior.
Key Points
- 8-second clips at 1080p.
- Physically plausible motion.
- Trained with a learned dynamics prior.
Reader Mode is being prepared.

arXiv cs.CV·Zhuojin Li, Hsin-Pai Cheng, Hong Cai, Shizhong Han, Fatih Porikli 2d agoCoReDiT: Spatial Coherence-Guided Token Pruning and Reconstruction for Efficient Diffusion Transformers
AI Summary
CoReDiT enhances Diffusion Transformers by optimizing token pruning for efficiency and quality.

arXiv cs.CV·Alvaro Lopez Pellicer, Plamen Angelov, Marwan Bukhari, Yi Li, Eduardo Soares, Jemma Kerns 2d agoProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows
AI Summary
ProtoMedAgent enhances clinical interpretability by integrating multimodal reporting with privacy-aware workflows.

arXiv cs.CV·Kanghyun Baek, Jaihyun Lew, Chaehun Shin, Jungbeom Lee, Sungroh Yoon 2d agoDiagnosing and Correcting Concept Omission in Multimodal Diffusion Transformers
AI Summary
The study addresses concept omission in MM-DiTs by introducing Omission Signal Intervention to enhance image generation.
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems
AI Summary
Invisible orchestrators in multi-agent LLM systems pose significant safety risks and affect behavior dynamics.
Enhanced and Efficient Reasoning in Large Learning Models
AI Summary
The paper proposes an efficient reasoning method for large language models, enhancing trust in generated content.

arXiv cs.CL·Mokshit Surana, Archit Rathod, Akshaj Satishkumar 2d agoMeasuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study
AI Summary
This study evaluates DExperts for mitigating toxicity in LLMs, revealing strengths and weaknesses in safety and latency.
33
≥75 high · 50–74 medium · <50 low
Why Featured
Physics consistency was the visible weakness in AI video; closing that gap brings consumer use cases within reach.