Uncertainty-Aware and Temporally Regulated Expert Advice in Reinforcement Learning for Autonomous Driving

arXiv cs.AI·Ahmed Abouelazm, Felix Klingebiel, Philip Sch\"orner, J. Marius Z\"ollner

6/1/2026

·~1 min·6/1/2026·en·1

Quick Answer

Quick Take

The proposed uncertainty-aware framework for reinforcement learning in autonomous driving enhances exploration safety by integrating expert advice, resulting in a 5-7% success improvement over the IQN baseline in CARLA simulations. This method effectively balances exploration and risk, leading to safer navigation in unsignalized intersections.

Key Points

Framework leverages expert advice to guide exploration in autonomous driving.
Utilizes adaptive thresholds for uncertainty to trigger expert guidance.
Implements a commitment-cooldown strategy to regulate advice duration.
Achieved 5-7% improvement in success rates over IQN baseline in experiments.
Combines expert and agent experiences in a shared replay buffer.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 30576v1 Announce Type: new Abstract: Exploration in reinforcement learning for autonomous driving is inherently unsafe: agents must experience novel behaviors to learn, yet exploration can lead to collisions or off-road driving. We propose an uncertainty-aware framework that leverages expert advice to guide exploration while avoiding long-term dependence.

Advice is triggered when epistemic or aleatoric uncertainty exceeds adaptive thresholds derived from rolling buffers, ensuring advice evolves with the agent's confidence. A commitment-cooldown strategy with a stochastic early-stop heuristic regulates the duration and frequency of guidance, exposing the agent to coherent maneuvers without exhausting the advice budget.

Expert and agent experiences are combined in a shared replay buffer within an off-policy implicit quantile network (IQN) backbone, enabling efficient reuse of expert trajectories. Experiments in CARLA show that our method outperforms the IQN baseline, improving success by 5-7% and reducing failures, demonstrating that risk-sensitive uncertainty coupled with regulated expert integration enables safer and more efficient exploration for sensor-based RL policy learning in unsignalized intersection navigation.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·David Krongauz, Arad Zulti, Eran Segal, Teddy Lazebnik

5h ago

FeaturedOriginal

Automatic Ordinary Differential Equations Discovery For Biological Systems Using Large Language Model Powered Agentic System

AI Summary

The MEDA system utilizes large language models and symbolic regression to autonomously discover ordinary differential equations for biological systems, achieving strong structural recovery and biologically plausible models. It outperforms existing methods by integrating domain knowledge and mechanistic constraints, demonstrating effective retrieval and extrapolation capabilities.

#LLM #Agent #Inference #AI Startup