The Origin
PARALLAX watches the application layer — API request patterns, user behavior, metadata. But what about the infrastructure underneath? Someone hijacks a GPU cluster for crypto mining. A rogue training job burns through compute for 48 hours on a stolen service account. A recon tool enumerates every IAM role across 13 AWS regions at 2 AM.
These attacks don't show up in API behavior. They show up in cloud telemetry — GPU utilization spikes, anomalous instance types, burst API patterns, off-hours activity.
PARALLAX needed an infrastructure-layer companion. That's KESTREL.
The Thesis
Baseline normal, detect deviations. Build rolling per-entity behavioral profiles using Welford's online algorithm. When an entity's GPU utilization, API call frequency, or access patterns deviate beyond 2.5 standard deviations from their own baseline — that's the signal. No static thresholds. No training data requirements. Each entity is its own reference point.
The Build
Three detectors, each targeting a different attack surface. 92 tests, 98% coverage. Built in Python 3.11+ with Pydantic v2, Click CLI, SQLite persistence, and Sigma rule export.
GPUHijackDetector— 3 signals: GPU utilization ≥90%, anomalous regions (ap-southeast-1, sa-east-1, af-south-1), off-hours activity (h<6 or h≥22). Severity scales with signal count. ATLAS: AML.T0048 Denial of ML Service.RogueTrainingDetector— 4 signals: unexpected instance type, off-hours training, abnormal duration (z≥2.5σ), unusual data transfer (z≥2.5σ). Baseline-driven z-score checks. ATLAS: AML.T0020 Train Model.CloudAPIAbuseDetector— 4 signals in 5-minute sliding windows: API burst (≥50 calls/5m or z≥2.5), region fan-out (>5 regions), off-hours spike, excessive error rate (≥30% AccessDenied). ATLAS: T1526 Cloud Service Discovery.
kestrel generate # Synthetic events (80/20 normal/attack)
kestrel scan # Ingest → baseline → detect → findings
kestrel findings # Export as JSON or Sigma YAML
kestrel serve # Flask dashboard
Every finding is tagged with MITRE ATLAS tactics and techniques. Every finding exports to Sigma rules that plug into existing SIEM pipelines. Full source on GitHub.
The v1 Reality Check
Ran KESTREL against 34,427 real CloudTrail events from Splunk's attack_data repository. 51 datasets covering MITRE ATT&CK techniques — IAM enumeration, privilege escalation, security group abuse, KMS manipulation. 77 unique entities across 16 AWS regions.
One finding out of 2,167 attack events. The GPUHijackDetector fired once — on a Bedrock InvokeModel at 11 PM. That was a true positive. But daftpunk, cloudmapper, cloudsploit, bhavin_cli, okta_ro_role — all missed. The RogueTrainingDetector never fired at all.
Perfect precision. Near-zero recall. The pipeline worked. It was aimed at the wrong layer.
The Diagnosis
Impedance mismatch. CloudTrail is a management-plane audit log — it records who called what AWS API, when, from where. KESTREL's original detectors look for data-plane GPU telemetry — utilization %, memory %, power watts, training job durations.
99.98% of the CloudTrail events mapped to api_call. Zero mapped to training_job. One mapped to gpu_metric. The GPUHijackDetector and RogueTrainingDetector were built for CloudWatch metrics and NVIDIA DCGM telemetry — data that CloudTrail doesn't carry. The attacks in the dataset were real (IAM enumeration, recon scanning, privilege escalation), but they lived at the API call layer, not the GPU layer.
The baselining engine DID detect 105 z-score anomalies on time-of-day patterns. It flagged daftpunk at z=+11.7 for late-night Bedrock scanning. The signal was there — nothing was consuming it.
The Fix
CloudAPIAbuseDetector. Built to process the 34,420 api_call events that v1 ignored. Four signals in 5-minute sliding windows:
Multi-signal combinations drive severity escalation: daftpunk (off-hours + 100% errors) → HIGH. cloudmapper (burst + 13-region fan-out) → HIGH. okta_ro_role (292 calls + 12 regions) → HIGH. bhavin_cli generated 24 findings across multiple time windows.
The Results
Recall: 0.0005 → 0.766. Precision held at 1.0. Zero false positives.
220 findings across 3 detectors. Severity distribution: 2 CRITICAL, 14 HIGH, 98 MEDIUM, 106 LOW (baseline anomalies). The entities that v1 missed — daftpunk, cloudmapper, cloudsploit, bhavin_cli, okta_ro_role — v2 catches them all.
The fix wasn't more sophisticated ML. It was meeting the data where it lives. CloudTrail provides API call patterns. CloudAPIAbuseDetector processes API call patterns. 99.98% of the dataset went from ignored to analyzed.
The approach that worked: diagnose the mismatch, build the detector that fits the data, keep the existing detectors for when the right telemetry arrives. Subtraction (understanding what's missing) came before addition (building what's needed).
Full-Stack Detection
PARALLAX operates at the application layer — API request patterns, user behavior, content-free metadata. 15 detectors, 0.68 AUC on real LANL data. KESTREL operates at the infrastructure layer — GPU utilization, training job profiles, API abuse patterns. 3 detectors, 0.766 recall on real CloudTrail data.
Together they provide full-stack AI platform threat detection. PARALLAX catches distillation and account takeover through behavioral shifts. KESTREL catches GPU hijacking, rogue training, and infrastructure recon through z-score anomalies. Different layers, same principle: the attack lives in the pattern, not the content.
Where It Stands
v2 complete. 3 detectors, 92 tests, 98% coverage. Real-data evaluation on 34K CloudTrail events. Open source. Clear paths forward:
- Privilege escalation detector for CreateAccessKey/AttachPolicy chains
- Cross-entity correlation for coordinated scanning patterns
- Real GPU telemetry evaluation (CloudWatch/DCGM) for existing detectors
- Time-windowed severity decay for sustained vs burst activity