Narration-of-Thought: Inference-Time Scaffolding for Defeasible Ethical Reasoning in Large Language Models
Quick Answer
This paper shows that The Narration-of-Thought (NoT) system prompt significantly enhances ethical reasoning in large language models, reducing stakeholder collapse from 31% to under 1% and uncertainty suppression from 72% to 1-24% across four model generators.
Quick Take
The Narration-of-Thought (NoT) system prompt significantly enhances ethical reasoning in large language models, reducing stakeholder collapse from 31% to under 1% and uncertainty suppression from 72% to 1-24% across four model generators. This method requires no additional training and achieves a consensus increase from 6% to 95% in multi-stakeholder debates, providing a robust framework for ethical decision-making.
Key Points
- NoT organizes ethical reasoning into five sections: protagonist, stakeholders, consequences, uncertainty, commitment.
- Achieved a stakeholder collapse reduction from 31% to under 1% across 100 DailyDilemmas scenarios.
- Uncertainty suppression decreased from up to 72% to 1-24% across all models tested.
- Extended to a five-round debate, achieving 95% consensus from a 6% standoff.
- NoT requires no additional training, parameters, or fine-tuning.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2606. 26366v1 Announce Type: new Abstract: Standard chain-of-thought on moral dilemmas exhibits two failure modes: stakeholder collapse (the trace names at most one party with a stake in the outcome) and uncertainty suppression (no explicit unknowns or hedges before committing to an action). We introduce narration-of-thought (NoT), a system prompt that structures chain-of-thought into five sections: protagonist, stakeholders, two-step consequences, uncertainty, then commitment.
NoT adds no training, parameters, or fine-tuning. On 100 DailyDilemmas scenarios across four generators from three vendors, NoT cuts stakeholder collapse from up to 31% to under 1% and uncertainty suppression from up to 72% to 1-24% on every model. A matched-budget verbose-CoT control rules out token spend as the active ingredient; NoT retains Cliff's delta advantages of +0. 79 to +0. 90 on stakeholder count and +0. 65 to +0.
93 on uncertainty score for three of four generators, and a section ablation attributes each shift to its specific sub-instruction. Textual-gradient descent initialised at NoT improves the scaffold further; a cross-family training judge (different vendor from the generator) dominates an in-family one on every measured axis.
Extended to a five-round multi-stakeholder debate protocol, the scaffold converts a 6% standoff into 95% full consensus on a calibration set and 100% combined convergence on a DailyDilemmas replication. The resulting traces externalise the stakeholders, consequences, and uncertainty grounding each commitment, providing an auditable substrate for dependable agentic deployment.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →How Do Tool-Augmented LLM Agents Perform on Real-World Energy Analytics Tasks?
This study evaluates tool-augmented LLM agents on 243 energy market analytics tasks, revealing significant performance differences between closed-source and open-source models. The tasks cover market data retrieval, knowledge interpretation, and quantitative modeling, highlighting the need for real-time data and specialized tools in energy analytics.