This interactive demo requires JavaScript. View the project page for static results, or view the source on GitHub.

KESTREL DETECTION PIPELINE

Live simulation — select a cloud workload scenario and watch z-score baselining catch anomalies in real time

ingest -> store -> baseline -> detect -> findings -> export

entity_id--

entity_type--

cloud_provider--

region--

avg_gpu_util--

avg_mem_util--

avg_power_watts--

api_calls/hr--

active_hours--

instance_type--

baseline_events--

GPUHijackDetector

AML.TA0011 IMPACT / AML.T0048 Denial of ML Service

high_gpu_utilization (>=90%) --

anomalous_region --

off_hours_gpu_spike (h<6 || h>=22) --

verdict CLEAR

RogueTrainingDetector

AML.TA0005 EXECUTION / AML.T0020 Train Model

unexpected_instance_type --

off_hours_training (h<6 || h>=22) --

abnormal_duration (z>=2.5) --

unusual_data_transfer (z>=2.5) --

verdict CLEAR

CloudAPIAbuseDetector v2

AML.TA0008 DISCOVERY / AML.T1526 Cloud Service Discovery

api_burst_frequency (>=50/5m) --

region_fanout (>5 regions) --

off_hours_api_spike (h<6 || h>=22) --

excessive_error_rate (>=30%) --

verdict CLEAR

[ ] no findings — all clear

# no sigma rules generated

Client-side simulation using entity profiles and detection thresholds from the real pipeline.

REAL-WORLD EVALUATION

Evaluated against Splunk Attack Data — 34,427 real CloudTrail events from AWS attack scenarios. Two iterations: diagnose, then fix.

Dataset: splunk/attack_data · 34,427 events · 51 datasets · 77 unique entities · 16 AWS regions · 2,167 attack-labeled events

1.0

Precision

0 false positives (v1 and v2)

0.766

Recall (v2)

↑ from 0.0005 (1,532x)

220

Total Findings

↑ from 1 (v1)

Detectors

GPU + Rogue + API Abuse

~/eval/v2 -- severity distribution (220 findings)

CRITICAL

HIGH

MEDIUM

106

LOW (baseline)

~/eval -- recall progression across 2 iterations

0.0005

GPU + Rogue detectors only

1 finding

0.766

+ CloudAPIAbuseDetector

220 findings

v1 — diagnosis

v1 hit an impedance mismatch. CloudTrail is a management-plane audit log. The original detectors (GPUHijack, RogueTraining) look for data-plane GPU telemetry — utilization %, memory %, power watts — that CloudTrail doesn't carry.

What v1 detectors need	What CloudTrail provides
gpu_utilization: 95%	nothing (CloudWatch/DCGM domain)
event_type: training_job	0 training events in 34,427
event_type: gpu_metric	1 event (Bedrock InvokeModel)

99.98% of the data was api_call events that v1 couldn't process. One finding. Precision 1.0, recall 0.0005. The pipeline worked — it was aimed at the wrong layer.

v2 — fix

CloudAPIAbuseDetector — built to process the 34,420 api_call events that v1 ignored. Four signals in 5-minute sliding windows:

api_burst_frequency: ≥50 API calls per 5-minute window, or z-score ≥2.5 against rolling baseline. Caught cloudmapper at 130 calls/window, okta_ro_role at 292 calls/window.

region_fanout: Entity touching >5 distinct AWS regions in one window. Caught cloudmapper scanning 13 regions, cloudsploit across 12.

off_hours_api_spike: Significant API activity outside business hours (6AM-10PM). Caught daftpunk's 11 PM Bedrock scanning.

excessive_error_rate: ≥30% AccessDenied/Forbidden responses. Caught daftpunk at 100% error rate — every single call denied.

Multi-signal combinations produce meaningful severity: daftpunk (off-hours + 100% errors) → HIGH. cloudmapper (burst + 13-region fan-out) → HIGH. bhavin_cli generated 24 findings across multiple windows.

~/eval/v2 -- attack entities in the data

Entity	Role	Detector	v1	v2
Bedrock...dwrx4	ML pipeline role	GPUHijack	YES	YES
daftpunk	IAM user (Bedrock)	CloudAPIAbuse	NO	YES — HIGH
bhavin_cli	IAM user	CloudAPIAbuse	NO	YES — 24 findings
cloudmapper	IAM user	CloudAPIAbuse	NO	YES — HIGH
cloudsploit	IAM user	CloudAPIAbuse	NO	YES — burst+fanout
okta_ro_role	Service account	CloudAPIAbuse	NO	YES — HIGH (292 calls, 12 regions)
rhino_escalate	IAM user	--	NO	Partial

Conclusion

Recall: 0.0005 → 0.766. Precision held at 1.0. Zero false positives.

The fix wasn't more sophisticated ML — it was meeting the data where it lives. CloudTrail provides API call patterns. CloudAPIAbuseDetector processes API call patterns. 99.98% of the dataset went from ignored to analyzed.

220 findings across 3 detectors. Multi-signal severity escalation produces meaningful tiers: 2 CRITICAL, 14 HIGH, 98 MEDIUM. The entities that v1 missed — daftpunk, cloudmapper, cloudsploit, bhavin_cli, okta_ro_role — v2 catches them all.

Published transparently because the v1 failure is what makes the v2 improvement credible. Subtraction (diagnosing the mismatch) came before addition (building the right detector).

← back to project details · source on GitHub