Evaluating Deep Agents using LangSmith on AWS

5/28/2026

·~1 min·5/28/2026·en·3

Quick Answer

This guide integrates LangChain's evaluation patterns for deep agents with Anthropic's insights, detailing how to implement five evaluation methods, utilize pytest and LangSmith for offline evaluations, and set up online monitoring for production.

Quick Take

Key Points

Learn five evaluation patterns for deep agents from LangChain.
Build offline evaluations using pytest and LangSmith.
Configure online monitoring for production environments.
Example features a text-to-SQL deep agent with Amazon Bedrock.
Guide supports the full development to production lifecycle.

Article Excerpt

From source RSS / original summary

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Read on aws.amazon.com

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from AWS Machine Learning

See more →

Implement on-behalf-of token exchange for multi-tenant agents with Amazon Bedrock AgentCore Gateway

AWS Machine Learning·Dhawalkumar Patel

1d ago

FeaturedOriginal

Implement on-behalf-of token exchange for multi-tenant agents with Amazon Bedrock AgentCore Gateway

AI Summary

Amazon Bedrock AgentCore Gateway introduces on-behalf-of (OBO) token exchange for multi-tenant AI agents, addressing identity issues when calling downstream APIs. This implementation guide demonstrates how to maintain user identity and enforce least privilege while scaling across tenants using OAuth 2.0 Token Exchange (RFC 8693).

#Agent #AI Coding #Security #Policy