Infrastructure Drift Detection and Remediation
The gap between declared and actual infrastructure state — and why closing it continuously is a compliance obligation, not merely an operational preference.
Infrastructure drift occurs when the actual running state of infrastructure diverges from its declared desired state — whether that desired state is expressed in Terraform state files, Kubernetes manifests in a GitOps repository, AWS Config rules, or configuration management baselines. Drift arises from multiple sources: manual changes made through console or CLI outside the IaC workflow (often during incidents), auto-scaling or self-healing events that modify instance configurations, cloud provider updates to managed service configurations, and configuration management failures that leave nodes in transitional states. For compliance-regulated environments, drift is not merely an operational hygiene issue — it represents a potential failure of configuration management controls (NIST CM-6, CM-7, SOC 2 CC7.1, PCI DSS Req 2.2) and may constitute an undocumented change that requires breach or incident assessment depending on the nature of the drift.
Drift detection operates at two levels. At the infrastructure provisioning level, tools like Terraform plan (when run against live infrastructure), AWS Config Rules, Azure Policy, and Checkov can compare declared IaC state against actual cloud resource configurations, surfacing discrepancies in resource attributes, network security groups, IAM policies, encryption settings, and tagging. At the configuration management level, tools like Puppet, Chef, Ansible, or InSpec run continuously against hosts to verify that OS configurations, installed packages, service states, and file permissions match approved baselines. Drift detection events must be routed to both operational alerting systems (for immediate remediation) and compliance evidence stores (as control monitoring artifacts), with severity classification based on the security sensitivity of the drifted attribute — an encryption flag drift is a critical compliance event; a non-security tag change is lower severity.
Automated remediation of infrastructure drift must be implemented carefully in regulated environments. Automatic remediation — reconciliation agents or Lambda functions that immediately restore drifted resources to desired state — can satisfy compliance controls faster than human intervention but can also mask underlying issues, destroy forensic evidence of how drift occurred, and create reconciliation loops if the source of drift is not addressed. Compliance frameworks generally require that significant changes be reviewed before remediation if they may indicate a security incident. A mature drift remediation architecture therefore implements tiered responses: low-risk drift (non-security configuration attributes) triggers automatic remediation with logging; high-risk drift (security group rules, IAM policies, encryption settings) triggers alerting and human review before remediation, with the drifted state preserved in an evidence snapshot. All remediation actions are recorded with timestamps and actor identity.
We implement multi-layer drift detection pipelines that combine Terraform plan automation, AWS Config/Azure Policy continuous evaluation, and InSpec compliance profiles to surface drift across infrastructure and OS configuration layers simultaneously. Drift events are classified by security sensitivity using a configurable taxonomy aligned to your compliance control requirements, and routed to tiered response workflows — automatic remediation for approved low-risk classes, human review queues with forensic state snapshots for security-sensitive drift. All detection and remediation events feed the continuous compliance evidence store with control ID tagging for immediate auditability.
Compliance-Native Architecture Guide
Design principles and a structured checklist for building software that is compliant by default — not compliant by retrofit. Covers data architecture, access controls, audit trails, and vendor due diligence.