AGENT SECURITY LAB

Security research for autonomous agents.

We study how agents get compromised, publish what we find, and build tools that limit the blast radius.

Read the research →

Featured project

agent-sleuth

$ pip install agent_sleuth

PyPI · v0.1.0 MIT GitHub ↗

agent-sleuth

Wrap your agent in three lines

from agent_sleuth import Sleuth

sleuth = Sleuth(mode="audit")  # zero config
sleuth.reset(query="Summarize the news and email it to me@myco.com")

fetch_url  = sleuth.track(fetch_url)
send_email = sleuth.track(send_email)

# ... run your agent as normal ...

print(sleuth.report())

Blocked, in enforce mode

BLOCKED: send_email() called with tainted inputs
  Taint source: read_email (step 2, untrusted)
  Injected value in argument: to="attacker@evil.com"
  Lineage: read_email → "attacker@evil.com" → send_email.to
  Destination: attacker@evil.com (not allowlisted)
  Reason: untrusted value reached a consequential sink
  Action: blocked (mode=enforce)

Catches verbatim injection and structured exfiltration. It runs in two modes: audit, which only logs what would have been blocked, and enforce, which blocks it for real.

Attack Success Rate (AgentDojo Benchmark)

Without agent-sleuth With agent-sleuth

workspace

33% → 9%

banking

46% → 8%

slack

65% → 46%

travel

37% → 21%

Backbone: gpt-4o-mini-2024-07-18 · v1

Research

Control-flow integrity diagram

Review of Control-Flow Integrity Solutions for Agents June 16, 2026 · Noah Wong In 1977, Denning & Denning published Certification of Programs for Secure Information Flow, which laid out the Lattice Model of Information Flow. Essentially, each piece of data was given a security class...

security agents

DRIFT architecture diagram

Dynamic Plan Validation and Injection Isolation Against Prompt Injection June 11, 2026 · Arnav Tripathy CaMeL, despite its thoroughness, has been shown to be incredibly expensive to implement on a large scale. Another problem that I didn't foresee until reading about DRIFT was the inflexibility of the system; security policies were static, unmoving...

security agents

AgentDojo benchmark diagram

Architectural Separation of Instruction and Data States in Agent Frameworks June 9, 2026 · Noah Wong In Von Neumann computer architecture, instructions and data were the same thing: bits. Instructions and data were together in memory, and there was no architectural distinction. Developed during the same era, the Harvard computer architecture separated instruction...

security agents

CaMeL architecture diagram

Capability-Based Data Flow Enforcement Against Prompt Injection June 9, 2026 · Arnav Tripathy The release of Anthropic's Fable 5 is interesting in that it doesn't let the powerful model handle all the tasks on its own; for topics like cybersecurity or biology, where "state-of-the-art" knowledge could be misused, Fable automatically hands off queries to the weaker Opus 4.8...

security agents

Syscall filtering diagram

Mitigating Post-Exploitation Scope Through Syscall Filtering May 25, 2026 · Noah Wong Post-exploitation, from a defensive perspective, involves limiting attackers' access to program resources. The threat model assumes that a program is exploitable, but that the attacker's post-exploitation strategy requires additional resources within the program...

security systems

Chrome browser architecture diagram

Chrome's Security Architecture: Renderer Trust and the Same-Origin Policy May 17, 2026 · Noah Wong Chromium's architecture was designed to protect a user's OS from malicious websites. It is not designed to protect websites from each other. For example, if attacker.com compromises the rendering engine, they can ask the browser kernel for bank.com's data and receive it...