On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral

arXiv cs.AI·Quinten Steenhuis, Jacqueline Harvey

3h ago

·~1 min·6/2/2026·en·0

Quick Take

The FETCH classifier utilizes a low-cost ensemble of LLMs to generate follow-up questions for legal triage, but higher-quality questions require a more sophisticated model like GPT-5. This approach improves classification accuracy and highlights the need for dedicated screening panels in specific legal areas, such as domestic violence.

Key Points

FETCH uses low-cost LLMs for generating follow-up questions in legal triage.
High-quality questions necessitate advanced models like GPT-5 for better accuracy.
Human and LLM ratings diverge, indicating limitations in LLM assessments.
Dedicated screening panels are recommended for specific legal issues.
Fact elicitation varies across categories, impacting classification effectiveness.

Article Content

From source RSS / original summary

arXiv:2606. 00272v1 Announce Type: new Abstract: The FETCH classifier generates follow-up questions to help refine the best match for the applicant's legal problem, using a low-cost ensemble of LLMs. In this paper, we describe an expert attorney and LLM-assisted evaluation of the follow-up question approach in FETCH and show that while low-cost LLMs perform well at classification tasks, generating high-quality plain-language questions in this setting appears to require a more sophisticated and higher-cost model.

Through discussion with legal intake workers, we propose a rubric for the evaluation of legal intake classification questions, and we find that prompt engineering alone is not enough to improve question quality for intake purposes. We also find that LLM-as-judge and human ratings diverge.

We demonstrate that with the addition of a single high-cost model, GPT-5, the classifier can elicit relevant information from applicants for legal help, and that the questions lead to more accurate performance at classification tasks. We also find uneven fact elicitation across different categories, including domestic violence, at odds with family law screening protocols, suggesting the value of including dedicated screening panels for certain areas of law.

Reader Mode unavailable (could not extract clean content).

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Aliaksei Korshuk, Alexander Buyantuev, Ilya Makarov

3h ago

FeaturedOriginal

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

AI Summary

The In2AI solution introduces delayed per-step reward attribution for training language model agents in multi-agent environments, achieving top performance on the MindGames Arena benchmark at NeurIPS 2025. An 8-billion-parameter model outperformed larger proprietary systems, including GPT-5, in competitive play, demonstrating enhanced stability and sample efficiency in reinforcement learning.

#LLM #Agent #Inference #AI Startup