An Empirical Study of Automating Agent Evaluation · DeepSignal