From Noise to Control: Parameterized Diffusion Policies

arXiv cs.AI·Renhao Zhang, Haotian Fu, Mingxi Jia, George Konidaris, Yilun Du, Bruno Castro da Silva

3h ago

·~1 min·6/2/2026·en·0

Quick Take

The Parameterized Diffusion Policy (PDP) framework enhances behavior steering by learning diffusion policies based on low-dimensional parameters, significantly outperforming standard diffusion policies in complex multimodal benchmarks for both simulated and real-robot experiments.

Key Points

PDP constructs a behavior manifold reflecting semantic similarity between physical trajectories.
Enables smooth interpolation between known strategies without updating policy weights.
Demonstrated significant improvements in adaptation performance on complex benchmarks.
Effective in synthesizing novel behaviors in both simulated and real-robot scenarios.

Article Excerpt

From source RSS / original summary

arXiv:2606. 00336v1 Announce Type: new Abstract: We propose Parameterized Diffusion Policy (PDP), a framework for learning diffusion policies conditioned on low-dimensional, continuous parameters embedded in a learned behavior manifold. By constructing this manifold so that distances between latent representations reflect the semantic similarity between physical trajectories, we transform diffusion from a mechanism for stochastic diversity into a precise and optimizable tool for behavior steering.

Our approach enables smooth interpolation between known strategies and efficient adaptation to novel constraints without updating policy weights. We demonstrate that PDP significantly improves adaptation performance on complex multimodal benchmarks in both simulated and real-robot experiments compared to standard diffusion policies, particularly in scenarios requiring the synthesis of novel behaviors.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

3h ago

FeaturedOriginal

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

AI Summary

The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.

#LLM #Agent #Inference #AI Startup

From Noise to Control: Parameterized Diffusion Policies

Quick Take

Key Points

Article Excerpt

Want this in your inbox every morning?

More from arXiv cs.AI

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

The Importance of Out-of-Band Metadata for Safe Autonomous Agents: The Redpanda Agentic Data Plane

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

Related in this space

TorqueAGI Announces Collaborations with NVIDIA, John Deere, and Dexterity to Advance Physical AI for Enterprise-Grade Robots

FORT Robotics Acquires Mapless AI to Expand Its Trust Platform with Remote Supervision and Active Safety Capabilities