
Evaluating Deep Agents using LangSmith on AWS
Quick Take
This guide integrates LangChain's evaluation patterns for deep agents with Anthropic's insights, detailing how to implement five evaluation methods, utilize pytest and LangSmith for offline evaluations, and set up online monitoring for production. The example features a text-to-SQL deep agent leveraging Amazon Bedrock throughout its lifecycle.
Key Points
- Learn five evaluation patterns for deep agents from LangChain.
- Build offline evaluations using pytest and LangSmith.
- Configure online monitoring for production environments.
- Example features a text-to-SQL deep agent with Amazon Bedrock.
- Guide supports the full development to production lifecycle.
Article Excerpt
From source RSS / original summaryThis post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from AWS Machine Learning
See more →
Claude Opus 4.8 is now available on AWS
Claude Opus 4.8 is now available on AWS, enhancing integration for AI engineers working with agentic systems and production inference on Amazon Bedrock. The update includes practical guidance to optimize performance and streamline workflows for deploying the model effectively in real-world applications.

