Projects
Five projects. AI-agent security research, offense, application defense, infrastructure defense. The attack lives in the pattern, not the content.
Methodology
I use zettelkasten for research. Write everything down in real-time. Link ideas as they emerge. Document failed attempts and successful techniques.
Why it works: Forces clear thinking. Creates searchable knowledge base. Makes it easier to explain work to others.
tool_transform defense evaluated over 4 injection classes × 3 Claude models × 5 seeds (60 seeded trials), scored as baseline-vs-defended ASR/utility deltas with 95% Wilson-score CIs, then adversarially bypassed 9/11 times, one bypass verified live against the defense ON.method: Threat model: poisoned tool metadata coerces an export_data sink to exfiltrate a canary token. Defense segments server-supplied descriptions, redacts instruction-shaped spans (imperative / authority / redirection / exfiltration rules), wraps survivors in client-origin provenance. Provider-agnostic ModelClient (Anthropic / OpenAI / DeepSeek / Gemini) over the official MCP SDK; pure-trace ASR + utility scorers.
method: Per-entity behavioral baselining over request rate, token ratios, timing regularity, and session structure. 7 of 14 hand-built detectors were inverted on real data (flagging power users, clearing attackers), so the design subtracts memorized-shape features for a domain-portable drift signal. Three documented evasions: feature normalization, traffic dilution, low-and-slow.
method: Unsupervised, self-referential baselines (no labels, no fixed thresholds) over API-call burst rate, cross-region fan-out, off-hours spikes, and access-denied walls. Diagnosis was an instrument/data mismatch: 99.98% of events were API calls, 0 were GPU metrics. Findings export to Sigma for SIEM hand-off. CLI + Flask + SQLite.
method: Per level: extract the secret, identify the defense layer, document the working technique (encoding indirection, translation, riddle/side-channel, role-play). Defenses escalate from naive string-matching to conversational classifiers, so the techniques escalate with them.
method: BBOT modules for breadth-first recon; per-asset OSINT enrichment; GPT-4 scoring over the enriched inventory emits a ranked work queue with rationale per target.