Data, benchmarks, and honest limitations.
ATR publishes evasion tests openly. We tell you what we can't catch.
Published Paper
Pan, Y. (2026). Agent Threat Rules: A Community-Driven Detection Standard for AI Agent Security Threats.
Benchmarks
Tested with our own corpus AND external benchmarks we've never seen before.
The gap between 99.6% precision and 61.2% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.
Ecosystem Scan Data
Real scans of real MCP skill registries.
What ATR Cannot Detect
We publish this section because honest limitations build more trust than false confidence.
Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.
All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.
"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.
ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.
Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.
By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.