Kubernetes Security Hardening: CIS Benchmarks and NSA/CISA Guidance

The NSA and CISA published their Kubernetes Hardening Guide citing container and cluster-level misconfigurations as the primary source of Kubernetes incidents. Most attacks they documented didn't involve novel exploits — they exploited default configurations, overpermissioned service accounts, and absent network segmentation. This guide covers the concrete hardening steps from CIS Kubernetes Benchmark v1.8 and NSA/CISA guidance.

Control Plane Hardening

Secure API Server Configuration

The API server is the entry point to all Kubernetes operations. Key hardening flags:

# /etc/kubernetes/manifests/kube-apiserver.yaml
spec:
  containers:
  - command:
    - kube-apiserver
    # Disable anonymous authentication
    - --anonymous-auth=false
    # Enable RBAC and Node authorization
    - --authorization-mode=Node,RBAC
    # Require client certificates for kubelet connections
    - --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
    - --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
    # Enable audit logging
    - --audit-log-path=/var/log/kubernetes/audit.log
    - --audit-log-maxage=30
    - --audit-log-maxbackup=10
    - --audit-log-maxsize=100
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
    # Disable profiling endpoints
    - --profiling=false
    # Restrict admission controllers
    - --enable-admission-plugins=NodeRestriction,PodSecurity,ResourceQuota,LimitRanger
    # Disable service account automounting at admission
    - --disable-admission-plugins=ServiceAccount
    # Bind to specific interface
    - --bind-address=10.0.0.1
    # TLS settings
    - --tls-min-version=VersionTLS12
    - --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256

Audit Policy Configuration

A comprehensive audit policy captures security-relevant events without logging every list/watch request:

# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
  # Log auth changes at RequestResponse level
  - level: RequestResponse
    resources:
    - group: ""
      resources: ["secrets", "configmaps"]
    - group: "rbac.authorization.k8s.io"
      resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
  # Log pod exec, port-forward, and attach
  - level: RequestResponse
    verbs: ["create"]
    resources:
    - group: ""
      resources: ["pods/exec", "pods/portforward", "pods/attach"]
  # Log all create/update/delete at Metadata level
  - level: Metadata
    verbs: ["create", "update", "patch", "delete"]
  # Exclude noisy read requests
  - level: None
    verbs: ["get", "list", "watch"]
    resources:
    - group: ""
      resources: ["nodes", "pods", "services", "endpoints"]

Encrypt etcd at Rest

etcd stores all cluster state including Secrets. Without encryption, anyone with access to the etcd data directory can read all secrets:

# /etc/kubernetes/encryption-config.yaml
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  providers:
  - aescbc:
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>
  - identity: {}  # Fallback for unencrypted data (remove after migration)

Apply to kube-apiserver:

- --encryption-provider-config=/etc/kubernetes/encryption-config.yaml

Then re-encrypt all existing secrets:

kubectl get secrets --all-namespaces -o json | kubectl replace -f -

For key rotation, add the new key as the first entry in the keys list, restart the API server, re-encrypt all secrets, then remove the old key.

RBAC: Least Privilege Service Accounts

Disable Automounting of Service Account Tokens

By default, Kubernetes mounts a service account token in every pod. This is unnecessary for most workloads and provides attackers with API credentials if they compromise a container:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app
  namespace: production
automountServiceAccountToken: false

Or disable at the pod level:

apiVersion: v1
kind: Pod
spec:
  automountServiceAccountToken: false
  serviceAccountName: my-app

Create Minimal-Permission Service Accounts

Each workload should have its own service account with only the permissions it needs:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: config-reader
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["app-config"]  # Scope to specific ConfigMap names
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: config-reader-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: my-app
  namespace: production
roleRef:
  kind: Role
  name: config-reader
  apiGroup: rbac.authorization.k8s.io

Audit Overpermissioned ClusterRoles

Find service accounts with cluster-admin or wildcard permissions:

# List all cluster-admin bindings
kubectl get clusterrolebindings -o json | \
  jq '.items[] | select(.roleRef.name=="cluster-admin") | {name:.metadata.name, subjects:.subjects}'

# Find roles with wildcard verbs or resources
kubectl get clusterroles -o json | \
  jq '.items[] | select(.rules[]?.verbs[]? == "*") | .metadata.name'

Pod Security Standards

Pod Security Standards (PSS) replaced PodSecurityPolicy in Kubernetes 1.25. They operate at namespace level via labels and enforce three profiles:

Privileged: No restrictions (for system namespaces)
Baseline: Prevents known privilege escalation vectors
Restricted: Heavily restricted, requires dropping all capabilities and running as non-root

# Apply restricted policy to production namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.28
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Workloads in this namespace must comply with the restricted profile:

apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: my-app:1.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    resources:
      limits:
        cpu: "500m"
        memory: "128Mi"
      requests:
        cpu: "100m"
        memory: "64Mi"
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

Network Policies: Default-Deny Architecture

Without network policies, all pods can communicate with all other pods across all namespaces. Implement a default-deny policy in every namespace:

# Default deny all ingress and egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then explicitly allow required traffic:

# Allow frontend to reach backend on port 8080
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-backend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
---
# Allow DNS resolution (required for all pods)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - ports:
    - port: 53
      protocol: UDP
    - port: 53
      protocol: TCP

Note: Network policies require a CNI plugin that supports them (Calico, Cilium, Weave Net). The built-in kubenet plugin does not enforce network policies.

Runtime Security with Falco

Falco monitors system calls in real-time and alerts on suspicious behavior. It can detect container breakouts, privilege escalation, cryptomining, and data exfiltration:

# Install Falco via Helm
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm install falco falcosecurity/falco \
  --namespace falco \
  --create-namespace \
  --set falco.grpc.enabled=true \
  --set falco.grpcOutput.enabled=true \
  --set driver.kind=ebpf

Key Falco rules to enable:

# rules/custom.yaml

# Alert on shell execution inside containers
- rule: Shell Spawned in Container
  desc: Shell was spawned inside a container
  condition: >
    spawned_process and container and
    shell_procs and proc.pname != "sudo"
  output: >
    Shell spawned in container (user=%user.name container=%container.name
    image=%container.image.repository:%container.image.tag
    shell=%proc.name parent=%proc.pname)
  priority: WARNING

# Alert on sensitive file reads
- rule: Read Sensitive File
  desc: An attempt to read sensitive files
  condition: >
    open_read and container and
    (fd.name startswith /etc/shadow or
     fd.name startswith /etc/ssh or
     fd.name = /proc/1/environ)
  output: >
    Sensitive file opened for reading (file=%fd.name user=%user.name
    container=%container.name image=%container.image.repository)
  priority: ERROR

# Alert on network connection to unexpected destinations
- rule: Unexpected Network Outbound
  desc: Container making unexpected outbound connection
  condition: >
    outbound and container and
    not proc.name in (allowed_processes) and
    not fd.sip in (allowed_ips)
  output: >
    Unexpected outbound connection (container=%container.name
    connection=%fd.name ip=%fd.sip port=%fd.sport)
  priority: WARNING

Route Falco alerts to your SIEM via Falcosidekick, which supports Slack, PagerDuty, Elasticsearch, Datadog, and dozens of other outputs.

Image Security

Scan Images in CI/CD

Integrate vulnerability scanning into your pipeline so images with critical CVEs never reach production:

# GitHub Actions example
- name: Scan image with Trivy
  uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'my-registry/my-app:${{ github.sha }}'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
    exit-code: '1'  # Fail the build on critical/high CVEs

Use Distroless or Minimal Base Images

Standard images like ubuntu:latest or python:3.11 contain hundreds of packages that expand the attack surface. Distroless images contain only the application and its runtime dependencies:

# Multi-stage build with distroless final image
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

FROM gcr.io/distroless/python3-debian12
COPY --from=builder /root/.local /root/.local
COPY --from=builder /app /app
USER nonroot:nonroot
ENTRYPOINT ["python", "/app/main.py"]

Enforce Image Signing with Sigstore

Use cosign to sign images in CI/CD and verify signatures in Kubernetes admission via the built-in admission controller or a policy engine like Kyverno:

# Sign image after build
cosign sign --key cosign.key my-registry/my-app:sha256-abcdef123

# Verify in Kyverno policy
kubectl apply -f - <<EOF
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signature
spec:
  validationFailureAction: enforce
  rules:
  - name: check-image-signature
    match:
      resources:
        kinds: [Pod]
    verifyImages:
    - imageReferences: ["my-registry/*"]
      attestors:
      - entries:
        - keys:
            publicKeys: |
              -----BEGIN PUBLIC KEY-----
              ...
              -----END PUBLIC KEY-----
EOF

CIS Benchmark Automated Scanning

Run kube-bench to audit your cluster configuration against CIS Kubernetes Benchmark:

kubectl apply -f https://raw.githubusercontent.com/aquasecurity/kube-bench/main/job.yaml
kubectl logs -f job/kube-bench

kube-bench checks over 100 controls covering API server flags, etcd configuration, kubelet settings, and RBAC configuration, mapping each finding to CIS control IDs.

The most impactful controls to prioritize are: etcd encryption at rest (#1 in terms of blast radius if compromised), RBAC with minimal service account permissions (#2 most commonly misconfigured), default-deny network policies (stops lateral movement), and Falco runtime monitoring (detects active exploitation). These four address the majority of documented Kubernetes attack patterns.