AnyAct: Towards Human Reenactment of Character Motion From Video

arXiv cs.CV·Liuhan Chen, Lei Zhong, Jiewei Wang, Qin Shuai, Li Yuan, Leidong Fan, Qing Li, Kanglin Liu

4d ago

·~2 min·5/18/2026·en·1

Quick Take

AnyAct enables human reenactment from non-human character videos using sparse local motion cues.

Key Points

Focuses on motion reinterpretation rather than character reconstruction.
Introduces novel designs for effective human motion generation.
Demonstrates high-fidelity reenactments preserving character dynamics.

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

View PDF HTML (experimental)

Abstract:We study the problem of directly deriving an initial human reenactment from a monocular video of a non-human character. Our goal is not to reconstruct the source character itself but to reinterpret its motion as a plausible and editable human performance for downstream animation authoring. This task is challenging because existing video-based motion capture methods are largely restricted to human-centric structural spaces, while motion retargeting methods typically require structured 3D source motions and known source topologies. Our key insight is that sparse local articulated motion cues can preserve essential dynamics across large structural differences, providing a stable bridge from character video to human reenactment. Based on this observation, we propose AnyAct, which formulates character-video-driven human reenactment as conditional human motion generation from transferable sparse local 2D articulated motion. To make this practical, we introduce three key designs: human-motion-only supervision via augmented 3D-to-2D projection, progressive 3D-to-2D training to alleviate conditioning ambiguity, and global-local motion decoupling for reliable local motion control. We further construct a benchmark primarily covering diverse non-human character videos. Experiments on the benchmark show that AnyAct produces high-fidelity initial human reenactments that preserve the essential dynamics of the characters in reference videos, and further ablation studies validate the effectiveness of its core designs.

Comments:	12 pages
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Cite as:	arXiv:2605.15497 [cs.CV]
	(or arXiv:2605.15497v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2605.15497 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lei Zhong [view email]
[v1] Fri, 15 May 2026 00:23:36 UTC (34,002 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

AnyAct: Towards Human Reenactment of Character Motion From Video

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CV

GeoSym127K: Scalable Symbolically-verifiable Synthesis for Multimodal Geometric Reasoning

Structuring Open-Ended NAS: Semi-Automated Design Knowledge Structuring with LLMs for Efficient Neural Architecture Search

MedFM-Robust: Benchmarking Robustness of Medical Foundation Models

Related in this space

AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

Evaluating Temporal Semantic Caching and Workflow Optimization in Agentic Plan-Execute Pipelines