From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents
Quick Answer
This study introduces a value-based framework using GraphRAG to enhance LLM-based agents' alignment with human social values.
Quick Take
This study introduces a value-based framework using GraphRAG to enhance LLM-based agents' alignment with human social values. By evaluating expected behaviors through Maslow's Hierarchy and Plutchik's Wheel, the proposed method shows significant performance improvements on the DAILYDILEMMAS benchmark compared to existing models like ECoT and Plan-and-Solve, paving the way for self-emotion in AI systems.
Key Points
- Proposed a novel framework using GraphRAG for value-based instructions.
- Evaluated expected behaviors based on Maslow's and Plutchik's theories.
- Achieved significant performance gains on DAILYDILEMMAS benchmark.
- Outperformed models like ECoT, Plan-and-Solve, and Metacognitive prompting.
- Lays groundwork for self-emotion emergence in AI systems.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2605. 14034v1 Announce Type: new Abstract: Wide applications of LLM-based agents require strong alignment with human social values. However, current works still exhibit deficiencies in self-cognition and dilemma decision, as well as self-emotions. To remedy this, we propose a novel value-based framework that employs GraphRAG to convert principles into value-based instructions and steer the agent to behave as expected by retrieving the suitable instruction upon a specific conversation context.
To evaluate the ratio of expected behaviors, we define the expected behaviors from two famous theories, Maslow's Hierarchy of Needs and Plutchik's Wheel of Emotion. By experimenting with our method on the benchmark of DAILYDILEMMAS, our method exhibits significant performance gains compared to prompt-based baselines, including ECoT, Plan-and-Solve, and Metacognitive prompting. Our method provides a basis for the emergence of self-emotion in AI systems.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →The Verification Horizon: No Silver Bullet for Coding Agent Rewards
As coding agents evolve, verifying solutions becomes more challenging than generating them, necessitating a focus on scalable, faithful, and robust verification methods. The study reveals that no fixed reward function can sustain effectiveness as model capabilities advance, emphasizing the need for verification to evolve alongside solution generation.