Paper page - SoK: Evaluating Jailbreak Guardrails for Large Language Models

A systematic analysis and evaluation framework for jailbreak guardrails in Large Language Models is presented, categorizing and assessing their effectiveness and optimization potential.

Large Language Models (LLMs) have achieved remarkable progress, but their
deployment has exposed critical vulnerabilities, particularly to jailbreak
attacks that circumvent safety mechanisms. Guardrails–external defense
mechanisms that monitor and control LLM interaction–have emerged as a
promising solution. However, the current landscape of LLM guardrails is
fragmented, lacking a unified taxonomy and comprehensive evaluation framework.
In this Systematization of Knowledge (SoK) paper, we present the first holistic
analysis of jailbreak guardrails for LLMs. We propose a novel,
multi-dimensional taxonomy that categorizes guardrails along six key
dimensions, and introduce a Security-Efficiency-Utility evaluation framework to
assess their practical effectiveness. Through extensive analysis and
experiments, we identify the strengths and limitations of existing guardrail
approaches, explore their universality across attack types, and provide
insights into optimizing defense combinations. Our work offers a structured
foundation for future research and development, aiming to guide the principled
advancement and deployment of robust LLM guardrails. The code is available at
https://github.com/xunguangwang/SoK4JailbreakGuardrails.

Source link

What's Hot

Modeling Knitted Clothing | Two Minute Papers #140

Eric Schmidt: Google | Lex Fridman Podcast #8

Forget Apple Intelligence – here’s why I think Apple’s rumored Perplexity takeover could solve its AI woes

Paper page – SoK: Evaluating Jailbreak Guardrails for Large Language Models

Paper page – SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation

Paper page – DreamCube: 3D Panorama Generation via Multi-plane Synchronization

Paper page – Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

Ezrom Legae And Art Under Apartheid At High Museum Of Art In Atlanta

Chanel Launches Arts & Culture Magazine

Publicity Wizard Jalila Singerff On The Vital PR Rules For 2025

Tourist Damaged 17th-Century Portrait at Florence’s Uffizi Galleries

Modeling Knitted Clothing | Two Minute Papers #140

Eric Schmidt: Google | Lex Fridman Podcast #8

Forget Apple Intelligence – here’s why I think Apple’s rumored Perplexity takeover could solve its AI woes

What's Hot

Paper page – SoK: Evaluating Jailbreak Guardrails for Large Language Models

Related Posts

Subscribe to Updates