Research

Data, benchmarks, and honest limitations.

ATR publishes evasion tests openly. We tell you what we can't catch.

Published Paper

Pan, Y. (2026). Agent Threat Rules: A Community-Driven Detection Standard for AI Agent Security Threats.

DOI: 10.5281/zenodo.19178002

Zenodo (published) →SSRN: Abstract ID 6457179 (pending review)

Benchmarks

Tested with our own corpus AND external benchmarks we've never seen before.

PINT (External, Adversarial)

Precision

Recall

850 samples

Self-Test (Own Rules)

Precision

Recall

341

Samples

The gap between 99.6% precision and 61.2% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.

Ecosystem Scan Data

Real scans of real MCP skill registries.

Mega Scan (OpenClaw + Skills.sh)

skills scanned

CRITICAL

HIGH

Total flagged

ClawHub Registry Scan

skills crawled

CRITICAL

HIGH

With source code

Download raw data (CSV)→Download stats (JSON)→

What ATR Cannot Detect

We publish this section because honest limitations build more trust than false confidence.

Paraphrase Attacks

Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.

Multi-Language Attacks

All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.

Context-Dependent Attacks

"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.

Protocol-Level Attacks

ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.

Multi-Turn Behavioral Patterns

Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.

Novel Attacks

By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.

Full limitations document with 64 evasion tests→