Salesforce claims AI agents cut a 231-day migration to 13 days with fewer incidents

The Decoder·Matthias Bastian

5/30/2026

·~4 min·5/30/2026·en·2

Quick Answer

Quick Take

Salesforce's adoption of AI agents, particularly Anthropic's Claude Code, has drastically reduced API migration time from 231 days to 13 days, while improving developer efficiency metrics by over 50%. Despite concerns about quality, incidents decreased by 5%, showcasing the potential of agentic workflows in software development.

Key Points

Salesforce reports a 50.8% increase in completed work items per developer.
API migration was completed in 13 days, 18 times faster than traditional methods.
The 'Effective Output Score' improved by 151.3% year-over-year.
Incidents dropped by 5% despite increased pull requests.
Salesforce is exploring new team structures to adapt to AI-driven workflows.

📖 Reader Mode

~4 min read

Matthias Bastian

Few topics spark as much debate right now as the "agentic shift" in coding. Instead of writing code line by line, developers orchestrate software creation through AI agents.

Salesforce is now putting its own numbers behind that shift. In a post by Srinivas Tallapragada, Salesforce's head of engineering, the company says it has moved its entire development organization to agentic workflows. They rolled out Anthropic's Claude Code across the whole company as the main AI agent and gave every developer unlimited tokens to use it.

For April 2026, Salesforce reports a sharp efficiency jump compared to the same month last year. Completed work items per developer rose 50.8 percent. Merged pull requests per developer climbed 79 percent.

An ML-based "Effective Output Score" designed to measure the actual value of shipped code improved by 151.3 percent. None of these numbers can be independently verified.

More output, fewer incidents

The obvious question, whether quality suffers at this pace, Tallapragada answers by pointing to the company's own monitoring platform, Engineering 360. Despite the surge in pull requests, incidents dropped five percent. Safety guardrails and quality standards are baked into the agentic workflow, he says.

"When agentic tools get applied properly, quality doesn't suffer from speed. It benefits from it," Tallapragada writes. Salesforce doesn't back this claim with external audits or independent measurements.

Engineers are now building their own agentic workflows rather than just using off-the-shelf tools, according to Tallapragada. So-called Claude Code skills, reusable capabilities that encode team context, naming conventions, and workflow patterns, have become a new kind of engineering artifact. Salesforce also built a curated library called "AI Expert Suite" and "Salesforce Foundation Plugins" that serves as a shared foundation for all developers.

Sub-agents and agent teams, specialized AI agents that handle parallel workstreams within a larger task, are changing how complex work gets broken down. Developers no longer bounce between five systems. They describe the desired outcome, and coordinated agents handle the individual steps.

API migration in 13 days instead of 231

As a concrete example, Tallapragada points to migrating 33 API endpoints to a new cloud-native architecture. The traditional approach would have taken about 231 person-days, the company estimates. Using a rule-based framework built on Claude with Markdown files and reference implementations, the migration was done in 13 days; 18 times faster.

Each round of PR feedback was fed back into the rule set, so accuracy kept improving. Autonomous LLM loops of building, fixing, and validating ran without manual intervention. Migrations were parallelized across isolated environments. The result: five pull requests, with the largest single PR delivering 21 endpoints with full test coverage.

"The most important skill today is knowing how to structure problems for an agentic system, when to delegate versus stay in the loop, and how to build reusable patterns your team can compound on," Tallapragada writes.

Security, junior talent, and team structure remain unsolved

Tallapragada is upfront about a range of unsolved problems, calling them "genuinely hard." Context management in long agentic sessions is a skill engineers still need to learn. The quality of CLAUDE.md files—persistent context configs that align Claude with a codebase—varies widely between teams and has a big impact on output quality. Security needs a rethink too. When agents act on systems rather than just making suggestions, the blast radius of a misconfigured tool gets much larger.

Then there's the talent pipeline question. "When agents handle more of the execution layer, how do junior engineers grow into senior engineers if AI is absorbing much of the entry-level work? What is the role of a designer or product manager in this new world?" Tallapragada writes. Salesforce is experimenting with one-person or three-person units instead of traditional Scrum teams. It doesn't have clear answers yet.

Productivity leap or tech debt on autopilot?

A sharply different take came a few days ago from well-known programmer and hacker George Hotz. Using AI agents in software development will be one of the industry's most expensive mistakes, he argues.

LLMs are "sophisticated statistical models" that "mimic the distribution of programming" but can never truly program, Hotz says. Large organizations are especially at risk because weaker developers can't spot faulty output.

Even Andrej Karpathy, who now counts himself among agentic coding's supporters, has flagged quality problems. Agent-generated code is "not like super amazing code necessarily all the time," he said, calling it "bloaty, there's a lot of copy paste, there's awkward abstractions that are brittle, and like, it works, but it's just really gross." Unlike Hotz, though, Karpathy is still sold on the new approach and recently joined Anthropic.

A broader debate about the rising costs of AI relative to its benefits is heating up too, alongside questions about what the models actually deliver in day-to-day work.

— Originally published at the-decoder.com

Continue reading on the-decoder.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from The Decoder

See more →

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

The Decoder·Matthias Bastian

2w ago

FeaturedOriginal

An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

AI Summary

Epoch AI's MirrorCode benchmark reveals Claude Opus 4.7 as the leader with a 56% solve rate, reconstructing a 16,000-line toolkit in 14 hours. Despite this, all models tested struggle with the most complex tasks, highlighting limitations in current AI capabilities. The single task consumed $2,600 over 19 days, raising questions about cost-effectiveness in AI development.

#LLM #AI Coding #Inference #AI Startup