Skip to main content
back to agenda
on this page

Hack One, Hack Them All? Weaponizing LLM Jailbreak Transferability

  • 03:00
  • Tue
  • 02 Dec
Stage: Briefings 1
Session type: Presentation

Presenter:

In cybersecurity, there is a familiar pattern: a zero-day in one product is quickly weaponized into exploit kits that spread across many others. Large Language Models (LLMs) are no longer niche tools, they are becoming the foundation of everything from productivity apps to healthcare triage tools. This rapid adoption creates a systemic risk: jailbreak prompts often transfer across models, vendors, and architectures with little to no modification. An attacker who breaks one model may break many, at scale.This talk investigates jailbreak transferability as a vulnerability class with ecosystem-wide implications. Drawing on curated jailbreak datasets and cross-model experiments with open-source LLMs, we reveal preliminary empirical evidence of cross-model effectiveness and explain why some jailbreaks evaporate after updates while others persist like wormable exploits. The session introduces an early Jailbreak Transferability Matrix; a structured way of classifying jailbreaks by persistence, generalisation, and resilience to safety interventions, and frames how adversaries could weaponise these transferable attacks to scale harmful content generation or bypass safety controls simultaneously across platforms. Through offensive scenarios, we show how transferable jailbreaks on LLMs are vectors for mass exploitation, automating harmful content generation or bypassing safety filters across multiple platforms simultaneously. On the defense side, we outline how researchers, vendors, and policymakers can quantify transferability risk, prioritize testing, and contain cascading jailbreak failures before they spread. By understanding and quantifying jailbreak transferability, attendees can move from reactive patching to proactive ecosystem-level defenses, safeguarding the next generation of AI systems before attacks scale.

Presenter: