Physics from Video: Identifiability of Time-Invariant Second-Order ODEs under Minimal Trajectory Conditions
Quick Take
This study demonstrates that second-order linear ODE parameters can be uniquely identified from video data using an encoder-only pipeline. By establishing a level-set slope-coverage condition, the research shows that underdamped systems can be identified from a single video clip, while other regimes require three diverse trajectories. The proposed method ensures reliable estimation of physical constants without intensive pixel reconstruction.
Key Points
- Identifiability of second-order linear ODEs from raw video pixels is established.
- Underdamped systems can be identified from a single video clip.
- Three diverse trajectories are required for other damping regimes.
- A variance-floor regularizer stabilizes the decoder-free objective.
- The method is validated on both synthetic and real-world data.
Article Content
From source RSS / original summaryarXiv:2606. 00115v1 Announce Type: new Abstract: Bridging the gap between visual realism and physical understanding is a core challenge for video-based world models. We study the structural identifiability of continuous-time physical laws from raw pixels, focusing on whether an encoder-only pipeline can uniquely recover the parameters of second-order linear ODEs.
We prove that a level-set slope-coverage condition ensures the learned latent space is locally affine to the true physical state, enabling exact parameter recovery. Our theory provides the first characterization of minimal data requirements across damping regimes, establishing that underdamped systems are identifiable from a single video clip, whereas other regimes require three diverse trajectories.
We further introduce a variance-floor regularizer to stabilize the decoder-free objective and prevent latent collapse. Validated on synthetic and real-world data, our approach demonstrates that interpretable physical constants can be reliably estimated from video without the need for compute-intensive pixel reconstruction, ensuring both physical correctness and transparency. Code is available at https://github. com/wenjiewang3/PhysicsFromVideo.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.