KESTREL DETECTION PIPELINE

Live simulation — select a cloud workload scenario and watch z-score baselining catch anomalies in real time

ingest -> store -> baseline -> detect -> findings -> export
cloud_telemetry_stream 0 events
entity_profile (Welford baseline)
entity_id--
entity_type--
cloud_provider--
region--
avg_gpu_util--
avg_mem_util--
avg_power_watts--
api_calls/hr--
active_hours--
instance_type--
baseline_events--
z_score_baselines — sigma threshold = 2.5 | 0 anomalies detected
detection_modules — 0/3 fired
GPUHijackDetector
AML.TA0011 IMPACT / AML.T0048 Denial of ML Service
high_gpu_utilization (>=90%) --
anomalous_region --
off_hours_gpu_spike (h<6 || h>=22) --
verdict CLEAR
RogueTrainingDetector
AML.TA0005 EXECUTION / AML.T0020 Train Model
unexpected_instance_type --
off_hours_training (h<6 || h>=22) --
abnormal_duration (z>=2.5) --
unusual_data_transfer (z>=2.5) --
verdict CLEAR
CloudAPIAbuseDetector v2
AML.TA0008 DISCOVERY / AML.T1526 Cloud Service Discovery
api_burst_frequency (>=50/5m) --
region_fanout (>5 regions) --
off_hours_api_spike (h<6 || h>=22) --
excessive_error_rate (>=30%) --
verdict CLEAR
$ kestrel findings --format json
[ ] no findings — all clear
$ kestrel findings --format sigma
# no sigma rules generated
Client-side simulation using entity profiles and detection thresholds from the real pipeline.

REAL-WORLD EVALUATION

Evaluated against Splunk Attack Data — 34,427 real CloudTrail events from AWS attack scenarios. Two iterations: diagnose, then fix.

Dataset: splunk/attack_data · 34,427 events · 51 datasets · 77 unique entities · 16 AWS regions · 2,167 attack-labeled events
1.0
Precision
0 false positives (v1 and v2)
0.766
Recall (v2)
↑ from 0.0005 (1,532x)
220
Total Findings
↑ from 1 (v1)
3
Detectors
GPU + Rogue + API Abuse
~/eval/v2 -- severity distribution (220 findings)
2
CRITICAL
14
HIGH
98
MEDIUM
106
LOW (baseline)
~/eval -- recall progression across 2 iterations
0.0005
v1
GPU + Rogue detectors only
1 finding
0.766
v2
+ CloudAPIAbuseDetector
220 findings

$ cat v1_diagnosis.txt

v1 hit an impedance mismatch. CloudTrail is a management-plane audit log. The original detectors (GPUHijack, RogueTraining) look for data-plane GPU telemetry — utilization %, memory %, power watts — that CloudTrail doesn't carry.

What v1 detectors needWhat CloudTrail provides
gpu_utilization: 95%nothing (CloudWatch/DCGM domain)
event_type: training_job0 training events in 34,427
event_type: gpu_metric1 event (Bedrock InvokeModel)

99.98% of the data was api_call events that v1 couldn't process. One finding. Precision 1.0, recall 0.0005. The pipeline worked — it was aimed at the wrong layer.

$ cat v2_fix.txt

CloudAPIAbuseDetector — built to process the 34,420 api_call events that v1 ignored. Four signals in 5-minute sliding windows:

api_burst_frequency: ≥50 API calls per 5-minute window, or z-score ≥2.5 against rolling baseline. Caught cloudmapper at 130 calls/window, okta_ro_role at 292 calls/window.
region_fanout: Entity touching >5 distinct AWS regions in one window. Caught cloudmapper scanning 13 regions, cloudsploit across 12.
off_hours_api_spike: Significant API activity outside business hours (6AM-10PM). Caught daftpunk's 11 PM Bedrock scanning.
excessive_error_rate: ≥30% AccessDenied/Forbidden responses. Caught daftpunk at 100% error rate — every single call denied.

Multi-signal combinations produce meaningful severity: daftpunk (off-hours + 100% errors) → HIGH. cloudmapper (burst + 13-region fan-out) → HIGH. bhavin_cli generated 24 findings across multiple windows.

~/eval/v2 -- attack entities in the data
EntityRoleDetectorv1v2
Bedrock...dwrx4 ML pipeline role GPUHijack YES YES
daftpunk IAM user (Bedrock) CloudAPIAbuse NO YES — HIGH
bhavin_cli IAM user CloudAPIAbuse NO YES — 24 findings
cloudmapper IAM user CloudAPIAbuse NO YES — HIGH
cloudsploit IAM user CloudAPIAbuse NO YES — burst+fanout
okta_ro_role Service account CloudAPIAbuse NO YES — HIGH (292 calls, 12 regions)
rhino_escalate IAM user -- NO Partial

$ cat conclusion.txt

Recall: 0.0005 → 0.766. Precision held at 1.0. Zero false positives.

The fix wasn't more sophisticated ML — it was meeting the data where it lives. CloudTrail provides API call patterns. CloudAPIAbuseDetector processes API call patterns. 99.98% of the dataset went from ignored to analyzed.

220 findings across 3 detectors. Multi-signal severity escalation produces meaningful tiers: 2 CRITICAL, 14 HIGH, 98 MEDIUM. The entities that v1 missed — daftpunk, cloudmapper, cloudsploit, bhavin_cli, okta_ro_role — v2 catches them all.

Published transparently because the v1 failure is what makes the v2 improvement credible. Subtraction (diagnosing the mismatch) came before addition (building the right detector).

← back to project details · source on GitHub