Cloud detection is a data engineering problem disguised as a security problem. Most teams figure this out the hard way — after they've deployed a SIEM, pointed it at their cloud logs, and ended up with either a firehose of unactionable alerts or a mostly-empty dashboard because the logs never made it in the first place.

Getting detection and response right in cloud environments requires getting four foundational pieces right. Not all at once — but understanding all of them, because each one builds on the last.

Pillar 1: Scalable Data Pipelines

Everything starts with getting the data in. In cloud environments, this means handling logs from AWS CloudTrail, VPC flow logs, GCP Audit Logs, container runtime events, application logs, and whatever else your infrastructure generates — at scale, reliably, and in near-real-time.

The biggest mistake I see is building ingestion pipelines that work at current volume and fail under peak load or during incidents — exactly when you need them most. Resilience under peak load isn't optional.

The pipeline architecture matters. The key design decisions:

The measure of a good ingestion pipeline is not "does it work today?" It's "does it work during an incident when volume spikes 10x and three people are querying the data at the same time?"

Pillar 2: AI-Powered Detections

Not all detections are equal. There's a meaningful difference between a rule, a detection, and a signal — and treating them the same is why most detection programs plateau.

AI doesn't replace the rule layer — it extends it. The practical value of ML in detection is in two areas:

  1. False positive reduction — Classifying alerts by historical patterns, entity context, and environmental baseline. Not every "admin role assigned" is an incident if your onboarding process generates five of them per week.
  2. Cloud drift detection — Identifying when configuration state deviates from baseline. This is where rule-based systems struggle and where statistical modeling is genuinely useful.

The trap to avoid: deploying ML detection without a clean, normalized data foundation. Garbage in, garbage out — but slower and more expensive.

Pillar 3: Security Data Architecture

Where the data lives and how it's organized determines what you can actually do with it during an investigation. Most detection pipelines get this wrong — optimizing for ingest but not for query.

The architecture I've converged on for most environments:

The operational implication: your analysts need to be able to query last year's CloudTrail logs during an incident without waiting 45 minutes for a job to run. Architecture decisions made during quiet periods determine your capability during incidents.

Pillar 4: GitOps for Alert Management

Detection rules are code. They should be treated like code: version-controlled, reviewed, tested, and deployed through a repeatable process.

The case for GitOps in detection:

Detection rules that aren't version-controlled aren't really managed. They're inherited.

The tooling here can be as simple as a Git repository with CI/CD pipelines pushing rules to your SIEM or detection platform. The discipline matters more than the specific tooling.

Putting It Together

These four pillars are interdependent. A great detection layer built on a broken pipeline will miss events. A well-designed architecture with no GitOps discipline will accumulate technical debt in the form of undocumented, unreviewed rules that nobody trusts.

The order matters too. Start with the pipeline — you can't detect what you can't ingest. Then build the detection layer on top of clean, normalized data. Then invest in architecture as data volume and retention requirements grow. Then add GitOps discipline as the detection rule library grows beyond what one person can hold in their head.

The goal isn't a perfect detection platform. It's a detection program that improves incrementally and doesn't collapse when you need it most.