GradShield: Alignment Preserving Finetuning · DeepSignal
GradShield: Alignment Preserving Finetuning arXiv cs.CL · Zhanhao Hu, Xiao Huang, Patrick Mendoza, Emad A. Alghamdi, Basel Alomair, Raluca Ada Popa, David Wagner 2d ago · ~1 min· 5/15/2026· en· 1GradShield is a method that filters harmful data during LLM finetuning to maintain alignment and safety.
Key Points Introduces Finetuning Implicit Harmfulness Score (FIHS). Employs adaptive thresholding to filter harmful data. Achieves Attack Success Rate below 6% while preserving utility. Reader Mode unavailable (could not extract clean content).
arXiv cs.CL · Luis Lara, Aristides Milios, Zhi Hao Luo, Aditya Sharma, Ge Ya Luo, Christopher Beckham, Florian Golemo, Christopher Pal 2d ago Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards AI Summary
A new LLM-based approach generates floor plans while adhering to numerical and topological constraints using reinforcement learning.
📰 Read Original Signal Score
High signal — credible source, broad relevance.
Weight Score
Source authority 20% 80
Community heat 20% 0
Technical impact 30% 67
📰 Read Original arXiv cs.CL · Mokshit Surana, Archit Rathod, Akshaj Satishkumar 2d ago Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study AI Summary
This study evaluates DExperts for mitigating toxicity in LLMs, revealing strengths and weaknesses in safety and latency.
arXiv cs.CL · Chengzhi Liu, Yichen Guo, Yepeng Liu, Yuzhe Yang, Qianqi Yan, Xuandong Zhao, Wenyue Hua, Sheng Liu, Sharon Li, Yuheng Bu, Xin Eric Wang 2d ago Auditing Agent Harness Safety AI Summary
HarnessAudit framework evaluates safety in LLM agent execution, revealing risks in multi-agent systems.
Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems AI Summary
Invisible orchestrators in multi-agent LLM systems pose significant safety risks and affect behavior dynamics.
≥75 high · 50–74 medium · <50 low
Why Featured
GradShield enhances LLM safety by filtering harmful data during finetuning, crucial for developers and PMs focused on responsible AI deployment and for investors assessing risk management in AI projects.