Which Models Perform Better in Inheritance Reasoning? | AI Deep Signal

Which Models Perform Better in Inheritance Reasoning?

arXiv cs.CL·Mohammed Amine Mouhoub, Chahinez Bouchekif

6/15/2026

·~1 min·6/15/2026·en·1

Quick Answer

The study evaluates large language models in Arabic Islamic inheritance reasoning, revealing that commercial models outperform open-source ones.

Quick Take

Notably, Gemini 2.5 Flash achieved the highest reliability with an MRE of 0.989, excelling in identifying heirs and applying legal rules.

Key Points

Commercial models show superior performance in structured legal reasoning tasks.
Gemini 2.5 Flash achieved an MRE of 0.989, the highest in the study.
Open-source models demonstrated instability, especially in complex legal scenarios.
The evaluation highlights a significant reliability gap between model families.
Effective legal interpretation requires multi-step reasoning and precise computation.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Source Excerpt

arXiv:2606. 13751v1 Announce Type: new Abstract: This paper presents the participation of team PSL in the QIAS 2026 Shared Task on Arabic Islamic inheritance reasoning. The task evaluates the ability of to solve inheritance cases that require legal interpretation, multi-step reasoning, and precise numerical computation.

We compare \textit{commercial} and \textit{open-source} models under a unified prompting strategy to assess their effectiveness in structured legal reasoning with minimal task-specific adaptation. \\ Our results show a clear gap in reliability between the two model families. …

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Isabel Xu (The Overlake School), Cynthia Xu (The Overlake School), Rachel Ren (Edwards Vacuum Inc.), Cong Guo (The University of Memphis), Jiacheng Ding (The University of Memphis)

1w ago

FeaturedOriginal

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

AI Summary

TriAgent introduces a cost-efficient multi-agent system for financial sentiment analysis, combining VADER, FinBERT, and Qwen2.5. It achieves an F1 score of ~0.87 with significant savings of $9.3M/year at a 10M-user scale compared to GPT-4o-mini, while also detecting hallucinations with an AUC of 0.90.

#LLM #Agent #AI Startup #Enterprise AI

Which Models Perform Better in Inheritance Reasoning?

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quick Answer

Quick Take

Key Points

Paper Resources

Source Excerpt

Want this in your inbox every morning?

More from arXiv cs.CL

TriAgent: Divergence-Aware Multi-Agent Committees for Cost-Efficient Financial Sentiment Analysis

RF-Agent: A Practical Framework for Building Language Agents for RFIC Design

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

TriAgent: Divergence-Aware Committees for Cost-Efficient Financial Sentiment Analysis