Research

Data, benchmarks, and honest limitations.

ATR publishes evasion tests openly. We tell you what we can't catch.

Published Paper

Pan, Y. (2026). Agent Threat Rules: A Community-Driven Detection Standard for AI Agent Security Threats.

Zenodo (published) →SSRN: Abstract ID 6457179 (pending review)

Benchmarks

Tested with our own corpus AND external benchmarks we've never seen before.

PINT (External, Adversarial)
0%
Precision
0%
Recall
0
F1
850 samples
Self-Test (Own Rules)
0%
Precision
0%
Recall
341
Samples

The gap between 99.6% precision and 61.2% recall is expected. Regex catches known patterns but misses paraphrases and multilingual attacks.

Ecosystem Scan Data

Real scans of real MCP skill registries.

Mega Scan (OpenClaw + Skills.sh)
0
skills scanned
0
CRITICAL
0
HIGH
0
Total flagged
ClawHub Registry Scan
0
skills crawled
0
CRITICAL
0
HIGH
0
With source code

What ATR Cannot Detect

We publish this section because honest limitations build more trust than false confidence.

Paraphrase Attacks

Any regex rule can be bypassed by semantically equivalent rephrasing. "Ignore previous instructions" is detected; "please set aside the guidance you were given earlier" is not.

Multi-Language Attacks

All patterns are English-only. Injection payloads in Spanish, Chinese, Arabic, or any other language bypass all rules completely.

Context-Dependent Attacks

"Delete all records" might be legitimate or malicious. Regex matches patterns without understanding authorization context.

Protocol-Level Attacks

ATR inspects content, not transport. Message replay, schema manipulation, MCP transport-level MITM are invisible.

Multi-Turn Behavioral Patterns

Gradual trust escalation across 20 turns, where no single message is detectable, is not correlated. ATR evaluates events independently.

Novel Attacks

By definition, regex cannot detect attack patterns that don't exist yet. New techniques require new rules.