A case study of evaluating AI agents on a… | AI Deep Signal