KESTREL DETECTION PIPELINE
Live simulation — select a cloud workload scenario and watch z-score baselining catch anomalies in real time
REAL-WORLD EVALUATION
Evaluated against Splunk Attack Data — 34,427 real CloudTrail events from AWS attack scenarios. Two iterations: diagnose, then fix.
$ cat v1_diagnosis.txt
v1 hit an impedance mismatch. CloudTrail is a management-plane audit log. The original detectors (GPUHijack, RogueTraining) look for data-plane GPU telemetry — utilization %, memory %, power watts — that CloudTrail doesn't carry.
| What v1 detectors need | What CloudTrail provides |
|---|---|
| gpu_utilization: 95% | nothing (CloudWatch/DCGM domain) |
| event_type: training_job | 0 training events in 34,427 |
| event_type: gpu_metric | 1 event (Bedrock InvokeModel) |
99.98% of the data was api_call events that v1 couldn't process. One finding. Precision 1.0, recall 0.0005. The pipeline worked — it was aimed at the wrong layer.
$ cat v2_fix.txt
CloudAPIAbuseDetector — built to process the 34,420 api_call events that v1 ignored. Four signals in 5-minute sliding windows:
Multi-signal combinations produce meaningful severity: daftpunk (off-hours + 100% errors) → HIGH. cloudmapper (burst + 13-region fan-out) → HIGH. bhavin_cli generated 24 findings across multiple windows.
| Entity | Role | Detector | v1 | v2 |
|---|---|---|---|---|
| Bedrock...dwrx4 | ML pipeline role | GPUHijack | YES | YES |
| daftpunk | IAM user (Bedrock) | CloudAPIAbuse | NO | YES — HIGH |
| bhavin_cli | IAM user | CloudAPIAbuse | NO | YES — 24 findings |
| cloudmapper | IAM user | CloudAPIAbuse | NO | YES — HIGH |
| cloudsploit | IAM user | CloudAPIAbuse | NO | YES — burst+fanout |
| okta_ro_role | Service account | CloudAPIAbuse | NO | YES — HIGH (292 calls, 12 regions) |
| rhino_escalate | IAM user | -- | NO | Partial |
$ cat conclusion.txt
Recall: 0.0005 → 0.766. Precision held at 1.0. Zero false positives.
The fix wasn't more sophisticated ML — it was meeting the data where it lives. CloudTrail provides API call patterns. CloudAPIAbuseDetector processes API call patterns. 99.98% of the dataset went from ignored to analyzed.
220 findings across 3 detectors. Multi-signal severity escalation produces meaningful tiers: 2 CRITICAL, 14 HIGH, 98 MEDIUM. The entities that v1 missed — daftpunk, cloudmapper, cloudsploit, bhavin_cli, okta_ro_role — v2 catches them all.
Published transparently because the v1 failure is what makes the v2 improvement credible. Subtraction (diagnosing the mismatch) came before addition (building the right detector).