Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

3/25/2025

·~3 min·3/25/2025·en·0

Quick Answer

This paper shows that Berkeley AI Research deployed 100 RL-controlled autonomous vehicles to mitigate stop-and-go traffic waves, enhancing fuel efficiency and traffic flow.

Quick Take

Berkeley AI Research deployed 100 RL-controlled autonomous vehicles to mitigate stop-and-go traffic waves, enhancing fuel efficiency and traffic flow. The study highlights the effectiveness of RL in real-world scenarios, demonstrating that a small number of AVs can significantly benefit all road users by reducing congestion and emissions.

Key Points

100 RL-controlled AVs were deployed in rush-hour traffic to smooth congestion.
The study utilized fast, data-driven simulations for training AV controllers.
Small proportions of well-controlled AVs can significantly enhance traffic flow.
RL agents learned to maximize energy efficiency while ensuring safety.
Phantom jams are caused by amplified driving behavior fluctuations.

Paper Resources

Read Paperbair.berkeley.edu

Article Content

From source RSS / original summary

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and-go" waves, those frustrating slowdowns and speedups that usually have no clear cause but lead to congestion and significant energy waste.

To train efficient flow-smoothing controllers, we built fast, data-driven simulations that RL agents interact with, learning to maximize energy efficiency while maintaining throughput and operating safely around human drivers. Overall, a small proportion of well-controlled autonomous vehicles (AVs) is enough to significantly improve traffic flow and fuel efficiency for all drivers on the road.

Moreover, the trained controllers are designed to be deployable on most modern vehicles, operating in a decentralized manner and relying on standard radar sensors. In our latest paper, we explore the challenges of deploying RL controllers on a large-scale, from simulation to the field, during this 100-car experiment. The challenges of phantom jams A stop-and-go wave moving backwards through highway traffic.

If you drive, you’ve surely experienced the frustration of stop-and-go waves, those seemingly inexplicable traffic slowdowns that appear out of nowhere and then suddenly clear up. These waves are often caused by small fluctuations in our driving behavior that get amplified through the flow of traffic. We naturally adjust our speed based on the vehicle in front of us. If the gap opens, we speed up to keep up. If they brake, we also slow down.

But due to our nonzero reaction time, we might brake just a bit harder than the vehicle in front. The next driver behind us does the same, and this keeps amplifying. Over time, what started as an insignificant slowdown turns into a full stop further back in traffic. These waves move backward through the traffic stream, leading to significant drops in energy efficiency due to frequent accelerations, accompanied by increased CO2 emissions and accident risk. And this isn’t an isolated phenomenon!

These waves are ubiquitous on busy roads when the traffic density exceeds a critical threshold. So how can we address this problem? Traditional approaches like ramp metering and variable speed limits attempt to manage traffic flow, but they often require costly infrastructure and centralized coordination. A more scalable approach is to use AVs, which can dynamically adjust their driving behavior in real-time.

However, simply inserting AVs among human drivers isn’t enough: they must also drive in a smarter way that makes traffic better for everyone, which is where RL comes in. Fundamental diagram of traffic flow. The number of cars on the road (density) affects how much traffic is moving forward (flow). At low density, adding more cars increases flow because more vehicles can pass through.

But beyond a critical threshold, cars start blocking each other, leading to congestion, where adding more cars actually slows down overall movement. Reinforcement learning for wave-smoothing AVs RL is a powerful control approach where an agent learns to maximize a reward signal through interactions with an environment. The agent collects experience through trial and error, learns from its mistakes, and improves over time.

In our case, the environment is a mixed-autonomy traffic scenario, where AVs learn driving strategies to dampen stop-and-go waves and reduce fuel consumption for both themselves and nearby human-driven vehicles. Training these RL agents requires fast simulations with realistic traffic dynamics that can replicate highway stop-and-go behavior.

To achieve this, we leveraged experimental data collected on Interstate 24 (I-24) near Nashville, Tennessee, and used it to build simulations where vehicles replay highway trajectories, creating unstable traffic that AVs driving behind them learn to smooth out. Simulation replaying a highway…

Read on bair.berkeley.edu

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free