Open SafeRL — toolkit for testing LLM safety in agentic settings

Hugging Face·Hugging Face

6d ago

·~1 min·5/11/2026·en·1

Quick Take

Open SafeRL stress-tests LLM agents with jailbreak generation, tool-use abuse, and self-replication probes.

Key Points

Jailbreak generation.
Tool-use abuse coverage.
Self-replication probes.

📖 Reader Mode

~1 min read

Building Blocks for Foundation Model Training and Inference on AWS

May 11, 2026

EMO: Pretraining mixture of experts for emergent modularity

May 8, 2026

vLLM V0 to V1: Correctness Before Corrections in RL

May 6, 2026

audiospeechleaderboard

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

May 6, 2026

Granite 4.1 LLMs: How They’re Built

April 29, 2026

llmsinference-providersdeepinfra

DeepInfra on Hugging Face Inference Providers 🔥

April 29, 2026

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

April 28, 2026

openaiprivacy-filterweb-apps

How to build scalable web apps with OpenAI's Privacy Filter

April 27, 2026

llmmoelong-context

DeepSeek-V4: a million-token context that agents can actually use

April 24, 2026

guidetransformers.jsjavascript

How to Use Transformers.js in a Chrome Extension

April 23, 2026

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

April 21, 2026

cybersecurityopen-sourcecommunity

AI and the Future of Cybersecurity: Why Openness Matters

April 21, 2026

reinforcement-learningrlvre-commerce

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

April 16, 2026

announcementmlxllm

The PR you would have opened yourself

April 16, 2026

— Originally published at huggingface.co

Continue reading on huggingface.co

Open SafeRL — toolkit for testing LLM safety in agentic settings

Quick Take

Key Points

📖 Reader Mode

Building Blocks for Foundation Model Training and Inference on AWS

EMO: Pretraining mixture of experts for emergent modularity

vLLM V0 to V1: Correctness Before Corrections in RL

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Granite 4.1 LLMs: How They’re Built

DeepInfra on Hugging Face Inference Providers 🔥

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

How to build scalable web apps with OpenAI's Privacy Filter

DeepSeek-V4: a million-token context that agents can actually use

How to Use Transformers.js in a Chrome Extension

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

AI and the Future of Cybersecurity: Why Openness Matters

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

The PR you would have opened yourself

More from Hugging Face

Unlocking asynchronicity in continuous batching

Building Blocks for Foundation Model Training and Inference on AWS

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

Related in this space

Invisible Orchestrators Suppress Protective Behavior and Dissociate Power-Holders: Safety Risks in Multi-Agent LLM Systems

Auditing Agent Harness Safety

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study