Post

Docker Swarm Container Orchestration Guide

Comprehensive guide to Docker Swarm: initialize clusters, deploy and manage services, configure overlay networks, scale applications, implement security best practices, and monitor clusters.

Docker Swarm Container Orchestration Guide

Comprehensive Security Guide for Docker Swarm

1. Purpose and Overview

Docker Swarm is Docker’s native clustering and orchestration solution for Docker containers. While Kubernetes has gained more popularity in recent years, Docker Swarm remains relevant for organizations seeking a lighter-weight orchestration system integrated directly with the Docker engine.

This guide focuses specifically on security considerations, hardening techniques, and best practices for deploying and managing Docker Swarm in production environments. We’ll cover everything from secure initialization to ongoing operational security, with special attention to defense-in-depth approaches.

2. Table of Contents

3. Docker Swarm Security Architecture

3.1 Swarm Mode Security Features

Docker Swarm mode incorporates several built-in security features that provide a solid foundation for building secure container orchestration:

  • Automated TLS: Swarm mode automatically creates a self-signed CA, generates and distributes certificates to all nodes.
  • Certificate Rotation: TLS certificates used in Swarm are automatically rotated.
  • Encrypted Cluster Store: The Raft consensus store is encrypted by default.
  • Encrypted Join Tokens: Different tokens for workers and managers help maintain separation of privileges.
  • Mutual TLS Authentication: All control plane communication is protected with mutual TLS, ensuring both client and server authenticate each other.

These features provide defense-in-depth but must be complemented with proper operational security practices.

3.2 Security Considerations by Component

ComponentSecurity Considerations
Manager NodesMost sensitive components; compromise means full control of cluster
Worker NodesReduced privilege; still provide execution environment
Control PlaneTLS securing all communications
Data PlaneOpt-in encryption for overlay networks
Docker EngineRoot-level access on hosts; container isolation
API and CLIAuthentication and authorization concerns

4. Secure Swarm Initialization

4.1 Pre-Initialization Security Checklist

Before initializing your Swarm, ensure:

  • ✅ Host systems are fully patched
  • ✅ Docker Engine is updated to latest stable version
  • ✅ Default user accounts are secured (not using default passwords)
  • ✅ Firewall rules are configured for Swarm ports only
  • ✅ Docker daemon configurations are secured (see section 6.1)
  • ✅ SELinux/AppArmor is properly configured
  • ✅ Disk encryption is implemented for sensitive data
  • ✅ Network segmentation is properly configured
  • ✅ NTP is configured for time synchronization

4.2 Securing Manager Nodes

Manager nodes are the most critical components in your Swarm architecture. Compromise of manager nodes can lead to full cluster compromise.

1
2
# Initialize the swarm with explicit advertise address to control network exposure
docker swarm init --advertise-addr <MANAGER-IP> --autolock

The --autolock flag is crucial for security. It encrypts the Raft logs and requires a key to unlock the Swarm after restarts, providing protection against data extraction from disk.

Store the unlock key securely outside the Swarm (like in a password manager or HSM):

1
2
3
To unlock the swarm use the following key:

SWMKEY-1-5ZwhXs9trhfBzwL0zYJDX1Oon3jz1U2AvdASNzQ+vME

Additional manager hardening steps:

  • Implement separate management network for control plane traffic
  • Restrict physical and SSH access to manager nodes
  • Use dedicated nodes for management (not running other workloads)
  • Deploy an odd number of managers (3, 5, 7) distributed across availability zones
  • Use CPU/memory resource limits to prevent DoS conditions

4.3 Joining Worker Nodes Securely

Worker nodes should be joined using the worker token, never the manager token:

1
2
3
4
5
# Get the worker join token from a manager node
docker swarm join-token worker

# Join as a worker with the token
docker swarm join --token SWMTKN-1-49nj1cmql0... <MANAGER-IP>:2377

Security considerations:

  • Always use specific IP addresses, not 0.0.0.0
  • Rotate join tokens regularly
  • Clear Docker logs containing join tokens
  • Implement Just-In-Time (JIT) node provisioning using automation

4.4 Verifying Swarm Integrity

After initialization and after any major changes, verify the integrity of your Swarm:

1
2
3
4
5
6
7
8
# List and verify all nodes and their roles
docker node ls

# Check the status of the swarm
docker info | grep -A 10 Swarm

# Verify TLS configuration
docker info | grep -A 5 "Security Options"

5. Network Security

5.1 Control Plane Security

Docker Swarm requires the following ports for control plane traffic:

  • TCP port 2377 for cluster management
  • TCP and UDP port 7946 for node-to-node communication
  • UDP port 4789 for overlay network traffic

Secure these with strict firewall rules:

1
2
3
4
5
# Example UFW rules
sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp

Iptables example:

1
2
3
4
5
# Allow Swarm management traffic only from trusted IPs
iptables -A INPUT -p tcp -s 10.10.0.0/24 --dport 2377 -j ACCEPT

# Default deny for swarm management port
iptables -A INPUT -p tcp --dport 2377 -j DROP

5.2 Data Plane Security

For container-to-container communication, implement these security controls:

  • Network segmentation using overlay networks
  • Network policies to control traffic flow
  • Service isolation with dedicated overlay networks

Example of creating an isolated overlay network:

1
2
3
4
5
# Create an isolated overlay network
docker network create --driver overlay --attachable --opt encrypted=true isolated_network

# Deploy service on isolated network
docker service create --name secure-app --network isolated_network my-image

5.3 Overlay Network Encryption

Encrypt overlay networks to protect data in transit between containers:

1
2
3
4
5
# Create encrypted overlay network
docker network create --driver overlay --opt encrypted=true secure_overlay

# Verify encryption is enabled
docker network inspect secure_overlay | grep -A 3 Options

Note that encryption adds overhead, so benchmark performance impact before deploying broadly.

5.4 Ingress Network Configuration

The ingress network handles external traffic to published service ports. Secure it with:

1
2
3
4
5
6
7
8
9
# Remove the default ingress network (caution: disrupts running services)
docker network rm ingress

# Create a custom ingress network with encryption
docker network create \
  --driver overlay \
  --ingress \
  --opt encrypted=true \
  new-ingress

6. Node Hardening

6.1 Docker Daemon Security

Secure the Docker daemon with a proper configuration file (/etc/docker/daemon.json):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "icc": false,
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "userns-remap": "default",
  "live-restore": true,
  "userland-proxy": false,
  "no-new-privileges": true,
  "seccomp-profile": "/etc/docker/seccomp-profile.json",
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
    }
  },
  "selinux-enabled": true,
  "experimental": false
}

Key security settings:

  • icc: false - Disables inter-container communication
  • userns-remap - Enables user namespace isolation
  • no-new-privileges - Prevents privilege escalation
  • seccomp-profile - Applies syscall filtering
  • selinux-enabled - Enables SELinux security

6.2 Host OS Security

Apply these host security hardening measures:

  • Minimize installed packages (use minimal OS images)
  • Implement regular patching schedule
  • Enable and configure host-based firewall
  • Use SELinux/AppArmor in enforcing mode
  • Implement file integrity monitoring
  • Configure strong SSH authentication (keys only, no passwords)

Example of configuring auditd for Docker:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Add Docker daemon audit rules
cat << EOF > /etc/audit/rules.d/docker.rules
-w /usr/bin/docker -p wa
-w /var/lib/docker -p wa
-w /etc/docker -p wa
-w /lib/systemd/system/docker.service -p wa
-w /lib/systemd/system/docker.socket -p wa
-w /etc/default/docker -p wa
-w /etc/docker/daemon.json -p wa
-w /usr/bin/docker-containerd -p wa
-w /usr/bin/docker-runc -p wa
EOF

# Restart auditd
service auditd restart

6.3 Container Isolation

Enhance container isolation beyond the defaults:

  • Use read-only filesystems where possible
  • Apply appropriate Linux capabilities
  • Implement security profiles (seccomp, AppArmor)

Example service with enhanced security:

1
2
3
4
5
6
7
8
9
10
docker service create \
  --name secure-service \
  --read-only \
  --mount type=tmpfs,destination=/tmp \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  --security-opt seccomp=/etc/docker/seccomp-custom.json \
  --security-opt apparmor=docker-default \
  --security-opt no-new-privileges \
  nginx:alpine

7. Secret Management

7.1 Using Docker Secrets

Docker Swarm provides a native secrets management system:

1
2
3
4
5
6
7
8
9
# Create a secret
echo "secure_password" | docker secret create db_password -

# Use the secret in a service
docker service create \
  --name db \
  --secret db_password \
  --env DB_PASSWORD_FILE=/run/secrets/db_password \
  postgres

Secrets best practices:

  • Never expose secrets in service definitions or environment variables
  • Limit secret access to specific services
  • Implement secret rotation (see 7.3)
  • Avoid third-party images that don’t handle secrets properly

7.2 External Secret Management Integration

For more robust secret management, integrate with external systems:

HashiCorp Vault Integration Example:

  1. Deploy Vault agent in Swarm
1
2
3
4
5
docker service create \
  --name vault-agent \
  --network control-plane-network \
  --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
  vault-agent-image
  1. Create template for retrieving secrets:
1
2
3
4
template {
  source      = "/etc/vault-agent/templates/db-creds.tpl"
  destination = "/run/secrets/db-credentials"
}

7.3 Secrets Rotation

Implement a secure rotation strategy for secrets:

1
2
3
4
5
6
7
8
9
10
11
# Create new secret version
echo "new_secure_password" | docker secret create db_password_v2 -

# Update service to use new secret
docker service update \
  --secret-rm db_password \
  --secret-add db_password_v2 \
  db

# Remove old secret after confirming service is working
docker secret rm db_password

Automation script example for secret rotation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# Secret rotation script

# Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)

# Create new secret
echo $NEW_PASSWORD | docker secret create ${SECRET_NAME}_new -

# Update service
docker service update --secret-rm $SECRET_NAME --secret-add source=${SECRET_NAME}_new,target=$SECRET_NAME $SERVICE_NAME

# Verify service health
sleep 30
if docker service ls | grep $SERVICE_NAME | grep -q "0/"; then
  echo "Service update failed, rolling back"
  docker service update --secret-rm ${SECRET_NAME}_new --secret-add $SECRET_NAME $SERVICE_NAME
  exit 1
fi

# Remove old secret
docker secret rm $SECRET_NAME

# Rename new secret to standard name
docker secret create $SECRET_NAME - < <(docker exec $(docker ps -q -f name=$SERVICE_NAME) cat /run/secrets/${SECRET_NAME}_new)
docker service update --secret-rm ${SECRET_NAME}_new --secret-add $SECRET_NAME $SERVICE_NAME
docker secret rm ${SECRET_NAME}_new

8. Access Controls

8.1 Role-Based Access Control

Docker Enterprise Edition offers RBAC, but for standard Docker Swarm, implement controls with:

  • Separate management accounts from service accounts
  • Use team-based access via Unix groups
  • Implement sudo with limited commands for operators

Example sudoers configuration:

1
2
3
4
5
# Allow swarm operators to run specific docker commands
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker node ls
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service ls
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service logs
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service inspect

8.2 Label-Based Controls

Use node labels to control workload placement:

1
2
3
4
5
6
7
8
# Add security-level label to node
docker node update --label-add security=high node-1

# Deploy service only to high-security nodes
docker service create \
  --name secure-backend \
  --constraint node.labels.security==high \
  backend-image

8.3 API Access Controls

Secure the Docker API:

1
2
3
4
5
6
# Configure TLS for Docker API
mkdir -p /etc/docker/ssl
# Generate certs (use proper CA process in production)
openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
  -keyout /etc/docker/ssl/server-key.pem \
  -out /etc/docker/ssl/server-cert.pem

Update daemon.json:

1
2
3
4
5
6
7
8
{
  "tls": true,
  "tlsverify": true,
  "tlscacert": "/etc/docker/ssl/ca.pem",
  "tlscert": "/etc/docker/ssl/server-cert.pem",
  "tlskey": "/etc/docker/ssl/server-key.pem",
  "hosts": ["unix:///var/run/docker.sock", "tcp://0.0.0.0:2376"]
}

9. Secure Service Deployment

9.1 Image Security

Implement a secure container image strategy:

  • Use minimal base images (Alpine, distroless)
  • Scan images for vulnerabilities before deployment
  • Sign images with Docker Content Trust

Enable Docker Content Trust:

1
2
3
4
5
6
# Enable signing for push/pull operations
export DOCKER_CONTENT_TRUST=1

# Sign and push an image
docker tag myapp:latest myregistry.example.com/myapp:latest
docker push myregistry.example.com/myapp:latest

Configure a secure registry in daemon.json:

1
2
3
4
5
6
7
8
9
{
  "registry-mirrors": ["https://secure-registry.example.com"],
  "insecure-registries": [],
  "content-trust": {
    "trust-pinning": {
      "official": ["docker.io"]
    }
  }
}

9.2 Service Configuration Security

Deploy services with security-first configurations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
docker service create \
  --name web-frontend \
  --read-only \
  --user nobody:nogroup \
  --limit-cpu 0.5 \
  --limit-memory 512M \
  --reserve-cpu 0.1 \
  --reserve-memory 128M \
  --restart-condition on-failure \
  --restart-max-attempts 5 \
  --update-delay 10s \
  --update-parallelism 1 \
  --update-failure-action rollback \
  --health-cmd "curl -f http://localhost/ || exit 1" \
  --health-interval 5s \
  --health-retries 3 \
  --health-timeout 2s \
  --network secure_frontend \
  nginx:alpine

9.3 Resource Constraints

Implement resource constraints to prevent DoS conditions:

1
2
3
4
5
6
7
8
9
docker service create \
  --name resource-limited-app \
  --limit-cpu 0.25 \
  --limit-memory 256M \
  --reserve-cpu 0.1 \
  --reserve-memory 128M \
  --ulimit nofile=65536:65536 \
  --ulimit nproc=1024:1024 \
  myapp

9.4 Health Monitoring

Implement comprehensive health checks:

1
2
3
4
5
6
7
8
docker service create \
  --name monitored-app \
  --health-cmd "curl -f http://localhost:8080/health || exit 1" \
  --health-interval 15s \
  --health-timeout 5s \
  --health-retries 3 \
  --health-start-period 30s \
  myapp

10. Logging and Monitoring

10.1 Centralized Logging

Configure Docker logging to send to a central system:

1
2
3
4
5
6
7
8
9
// In daemon.json
{
  "log-driver": "syslog",
  "log-opts": {
    "syslog-address": "udp://log-aggregator.example.com:514",
    "syslog-facility": "daemon",
    "tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
  }
}

Service-specific logging:

1
2
3
4
5
6
docker service create \
  --name app-with-logging \
  --log-driver=fluentd \
  --log-opt fluentd-address=fluentd-aggregator.example.com:24224 \
  --log-opt tag="{{.Name}}.{{.ID}}" \
  myapp

10.2 Security Monitoring

Implement active security monitoring:

  • Container runtime monitoring (Falco)
  • Network traffic analysis
  • Host-based intrusion detection
  • API call auditing

Example Falco rule for Docker Swarm:

1
2
3
4
5
6
7
8
9
10
11
12
- rule: Unauthorized Docker Swarm Access
  desc: Detects unauthorized access to Docker Swarm API
  condition: >
    spawned_process and
    (proc.name = "curl" or proc.name = "wget") and
    (proc.cmdline contains "docker" and proc.cmdline contains "2377") and
    not user.name in (docker_users_list)
  output: >
    Unauthorized Docker Swarm API access attempt
    (user=%user.name command=%proc.cmdline)
  priority: WARNING
  tags: [process, mitre_discovery]

10.3 Alerting

Implement security alerting for critical events:

  • Manager node changes
  • Certificate rotation events
  • Secret access
  • Unauthorized API access attempts

Example Docker API monitoring script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# Docker Swarm API monitoring

LOG_FILE="/var/log/docker-api.log"
ALERT_SCRIPT="/usr/local/bin/send-security-alert.sh"

# Tail Docker daemon logs and look for API access
journalctl -fu docker | while read line; do
  if echo "$line" | grep -q "API access"; then
    echo "$(date) - $line" >> $LOG_FILE
    
    # Check if it's an unauthorized access
    if echo "$line" | grep -q "unauthorized" || echo "$line" | grep -q "permission denied"; then
      $ALERT_SCRIPT "Unauthorized Docker API access: $line"
    fi
  fi
done

11. Backup and Recovery

11.1 Swarm State Backup

Implement regular backups of Swarm state:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# Swarm backup script

BACKUP_DIR="/var/backups/swarm"
BACKUP_FILE="$BACKUP_DIR/swarm-$(date +%Y%m%d%H%M).tar.gz"
SWARM_DIR="/var/lib/docker/swarm"

# Stop Docker
systemctl stop docker

# Backup Swarm directory
tar -czf $BACKUP_FILE $SWARM_DIR

# Start Docker
systemctl start docker

# Encrypt backup
gpg --encrypt --recipient [email protected] $BACKUP_FILE

# Remove unencrypted backup
rm $BACKUP_FILE

# Verify Docker Swarm is healthy
if ! docker node ls &> /dev/null; then
  echo "WARNING: Swarm not functioning after backup!"
  # Send alert
  /usr/local/bin/send-alert.sh "Swarm backup failure, manual intervention required"
fi

11.2 Disaster Recovery Planning

Create a comprehensive DR plan:

  1. Document recovery procedures:
1
2
3
4
5
6
7
8
9
10
11
12
# Docker Swarm Recovery Procedure

## Prerequisites
- Backup file location: /backup/swarm-backup.tar.gz
- Manager node hostname: swarm-manager-01
- Manager IP: 10.0.1.10

## Recovery Steps

1. Install Docker on the new manager node
2. Stop Docker: `systemctl stop docker`
3. Restore the swarm directory:

mkdir -p /var/lib/docker tar -xzf /backup/swarm-backup.tar.gz -C /

1
2
3
4. Start Docker: `systemctl start docker`
5. Verify the swarm is restored: `docker node ls`
6. If recovery fails, initialize a new swarm and restore services:

docker swarm init –advertise-addr 10.0.1.10 –force-new-cluster

1
7. Apply service restore from config backup
  1. Regularly test recovery procedures
  2. Document recovery time objectives (RTOs)
  3. Maintain offline copies of critical configs

12. Secure Updates and Patching

12.1 Node Updates

Implement a node update strategy:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
# Secure Swarm node update script

NODE=$1
if [ -z "$NODE" ]; then
  echo "Usage: $0 <node-hostname>"
  exit 1
fi

# Step 1: Set node to drain state
docker node update --availability drain $NODE

# Step 2: Wait for containers to drain
echo "Waiting for containers to drain..."
while docker node ps $NODE | grep -q -v "Shutdown" | grep -q -v "ID"; do
  sleep 5
done

# Step 3: Update packages
ssh $NODE "apt-get update && apt-get upgrade -y"

# Step 4: Check for Docker updates
ssh $NODE "apt-get install -y docker-ce"

# Step 5: Reboot if kernel was updated
if ssh $NODE "[ -f /var/run/reboot-required ]"; then
  echo "Rebooting node..."
  ssh $NODE "reboot"
  sleep 60  # Wait for reboot
fi

# Step 6: Verify Docker is running
until ssh $NODE "docker info &>/dev/null"; do
  echo "Waiting for Docker to start..."
  sleep 5
done

# Step 7: Set node back to active
docker node update --availability active $NODE

# Step 8: Verify node is back
echo "Node update complete. Current status:"
docker node inspect $NODE --format '{{ .Status.State }}'

12.2 Container Image Updates

Implement a secure image update process:

1
2
3
4
5
6
7
8
# Update a service with a new image version
docker service update \
  --image myapp:1.2.1 \
  --update-parallelism 1 \
  --update-delay 30s \
  --update-failure-action rollback \
  --update-order start-first \
  myapp-service

Automated vulnerability patching:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
# Auto-update vulnerable images

# Get list of running services
SERVICES=$(docker service ls --format "{{.Name}}")

for SERVICE in $SERVICES; do
  # Get current image
  CURRENT_IMAGE=$(docker service inspect $SERVICE --format "{{.Spec.TaskTemplate.ContainerSpec.Image}}")
  
  # Get image without digest/tag
  BASE_IMAGE=$(echo $CURRENT_IMAGE | cut -d '@' -f 1 | cut -d ':' -f 1)
  
  # Check if newer version exists
  LATEST_DIGEST=$(docker pull $BASE_IMAGE:latest | grep "Digest:" | cut -d ' ' -f 2)
  CURRENT_DIGEST=$(echo $CURRENT_IMAGE | grep -o '@.*' || echo "")
  
  if [ "@$LATEST_DIGEST" != "$CURRENT_DIGEST" ]; then
    echo "Updating $SERVICE to latest secure version"
    
    # Update with security settings maintained
    docker service update \
      --image $BASE_IMAGE:latest \
      --update-parallelism 1 \
      --update-delay 30s \
      --update-failure-action rollback \
      $SERVICE
  fi
done

12.3 Docker Engine Updates

Create an update plan for Docker Engine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Docker Engine Update Procedure

1. Update manager nodes one at a time:
   - Set the target node to drain mode
   - Wait for tasks to be rescheduled
   - Update Docker Engine packages 
   - Restart the Docker daemon
   - Set node back to active
   - Verify swarm status before proceeding to next node

2. Update worker nodes in batches (max 30% at a time):
   - Drain a batch of nodes
   - Update Docker Engine  
   - Restart nodes
   - Verify nodes reconnect to swarm
   - Set nodes to active state
   - Proceed to next batch

3. Post-update verification:
   - Check all services are running correct replica count
   - Validate overlay network connectivity
   - Test service discovery
   - Verify secret access

13. Security Testing

13.1 Penetration Testing

Implement regular security testing:

  • Periodic penetration tests of Swarm infrastructure
  • Attack simulations for common threat vectors:
    • Container breakout attempts
    • Unauthorized API access
    • Control plane compromise attempts

Example testing approach:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Docker Swarm Penetration Testing Checklist

## Network Testing
- Port scan for open Docker ports (2375, 2376, 2377, 7946, 4789)
- TLS certificate validation
- Man-in-the-middle attack attempts against control plane
- Traffic sniffing on overlay networks

## API Security
- Unauthenticated API access attempts
- Authentication bypass tests
- Authorization tests for privileged operations

## Node Security
- Container breakout attempts
- Privilege escalation within containers
- Access to host resources from containers
- Docker socket mounting tests

## Secret Management
- Attempt to extract secrets from containers
- Test secret rotation procedures
- Verify proper secret isolation

## Documentation
- Document all findings
- Rate vulnerabilities by severity
- Provide remediation steps

13.2 Vulnerability Scanning

Implement container vulnerability scanning:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Example using Trivy scanner
#!/bin/bash

SERVICES=$(docker service ls --format "{{.Name}}")

for SERVICE in $SERVICES; do
  IMAGE=$(docker service inspect $SERVICE --format "{{.Spec.TaskTemplate.ContainerSpec.Image}}")
  
  echo "Scanning $SERVICE ($IMAGE)"
  
  # Run vulnerability scan
  trivy image $IMAGE > /var/log/security/trivy-$SERVICE.log
  
  # Check for critical vulnerabilities
  if grep -q "CRITICAL: [1-9]" /var/log/security/trivy-$SERVICE.log; then
    echo "CRITICAL vulnerabilities found in $SERVICE!"
    # Send alert
    ./send-security-alert.sh "Critical vulnerabilities in $SERVICE: $IMAGE"
  fi
done

Integrate scanning into CI/CD pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Example GitLab CI configuration
stages:
  - build
  - scan
  - deploy

build:
  stage: build
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .

security_scan:
  stage: scan
  script:
    - trivy image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - |
      if trivy image --exit-code 1 --severity CRITICAL $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA; then
        echo "No critical vulnerabilities found"
      else
        echo "Critical vulnerabilities found - failing build"
        exit 1
      fi

deploy:
  stage: deploy
  script:
    - docker service update --image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA my-service
  only:
    - master

13.3 Security Benchmarks

Implement Docker security benchmarks based on CIS guidelines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# CIS Docker Benchmark tester

# Install Docker Bench for Security
git clone https://github.com/docker/docker-bench-security.git
cd docker-bench-security

# Run the benchmark
./docker-bench-security.sh

# Check for failed tests
grep "\[WARN\]" docker-bench-security.log | tee security-warnings.txt
grep "\[FAIL\]" docker-bench-security.log | tee security-failures.txt

# Generate remediation report
echo "# Docker Security Remediation Report" > remediation.md
echo "Generated: $(date)" >> remediation.md
echo "" >> remediation.md
echo "## Failed Checks" >> remediation.md
grep "\[FAIL\]" docker-bench-security.log >> remediation.md
echo "" >> remediation.md
echo "## Warning Checks" >> remediation.md
grep "\[WARN\]" docker-bench-security.log >> remediation.md

Automate with scheduled jobs:

1
2
3
# /etc/cron.d/docker-security
# Run Docker security benchmark weekly
0 0 * * 0 root /usr/local/bin/docker-bench-security.sh > /var/log/docker-bench-results.log 2>&1

14. References and Further Reading

Official Documentation

Security Resources

CVE IDDescriptionAffected VersionsRemediation
CVE-2021-41091Volume permission race conditionDocker < 20.10.9Upgrade to Docker >= 20.10.9
CVE-2021-21285Symlink-following vulnerabilityDocker < 20.10.3Upgrade to Docker >= 20.10.3
CVE-2019-14271Container breakout with subuid mountingDocker < 19.03.1Upgrade to Docker >= 19.03.1
CVE-2019-5736runc container breakoutDocker < 18.09.2Upgrade to Docker >= 18.09.2

Blogs and Articles

15. Appendices

15.1 Common Misconfigurations

MisconfigurationRiskRemediation
Exposing Docker API without TLSRemote compromiseEnable TLS authentication
Running containers as rootPrivilege escalationUse USER directive in Dockerfile
Mounting Docker socketContainer breakoutAvoid socket mounting, use API proxy
Using default bridge networkNo network isolationUse custom overlay networks
Unrestricted resource consumptionDoS conditionsSet resource constraints
Deploying unscanned imagesVulnerable softwareImplement image scanning
Not enabling content trustImage tamperingEnable Docker Content Trust
Using latest tagsUnpredictable updatesUse specific version tags

15.2 Troubleshooting Security Issues

Certificate Problems

If you encounter TLS/certificate issues:

1
2
3
4
5
6
7
8
# Check certificate expiration
openssl x509 -in /var/lib/docker/swarm/certificates/swarm-node.crt -text -noout | grep "Not After"

# Force certificate rotation
docker swarm ca --rotate

# Verify certificate rotation
docker system info | grep -A 5 "CA Configuration"

Node Communication Issues

For control plane communication problems:

1
2
3
4
5
6
7
8
9
# Test control plane connectivity
for port in 2377 7946; do
  for proto in tcp udp; do
    nc -zv manager-node $port $proto
  done
done

# Check mutual TLS configuration
docker system info | grep -A 5 "Security Options"

Secret Access Issues

When services can’t access secrets:

1
2
3
4
5
6
7
8
# Check if secret exists
docker secret ls | grep my-secret

# Verify service has secret attached
docker service inspect my-service --format "{{.Spec.TaskTemplate.ContainerSpec.Secrets}}"

# Check container can read secret
docker exec $(docker ps -q -f name=my-service) ls -la /run/secrets/

Quick reference guide for security operations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Swarm Encryption Commands

## Enable autolock for an existing swarm
docker swarm update --autolock=true

## Get the unlock key
docker swarm unlock-key

## Rotate the unlock key
docker swarm unlock-key --rotate

## Unlock a swarm after restart
docker swarm unlock

# Certificate Commands

## Rotate swarm certificates
docker swarm ca --rotate

## View certificate validity
docker system info

# Security Inspection Commands

## Check service security options
docker service inspect --format "{{.Spec.TaskTemplate.ContainerSpec.Privileges}}" my-service

## List all secret usage
docker service ls --format "{{.Name}}" | xargs -I{} docker service inspect {} --format "{{.Name}}: {{.Spec.TaskTemplate.ContainerSpec.Secrets}}"

## Check network encryption
docker network ls --format "{{.Name}}" | xargs -I{} docker network inspect {} --format "{{.Name}}: {{.Options}}"

# Access Control Commands

## Add a label to control placement
docker node update --label-add security=high node-1

## View node labels
docker node inspect --format "{{.Spec.Labels}}" node-1

## Deploy with security constraints
docker service create --name secure-service --constraint node.labels.security==high nginx:alpine

16. Security Checklist

Use this checklist to ensure your Docker Swarm deployment follows security best practices:

  • Pre-Deployment Security
    • Host OS is minimal and hardened
    • Docker Engine is updated to latest stable version
    • Daemon configuration is security-optimized
    • Network segmentation is implemented
    • Firewall rules are in place for Swarm ports
  • Swarm Initialization
    • Used --autolock flag for encrypted Raft logs
    • Store unlock key securely
    • Used specific IP addresses, not 0.0.0.0
    • Deployed odd number of managers (3 or 5)
    • Manager nodes are dedicated (no other workloads)
  • Network Security
    • Overlay networks have encryption enabled
    • Control plane firewall rules in place
    • Ingress network is secured
    • Network policies implemented for service isolation
  • Secret Management
    • Using Docker Secrets for sensitive data
    • No secrets in environment variables
    • Secret rotation process is documented
    • Limited secret access to required services only
  • Access Controls
    • Role-based access implemented
    • API endpoint secured with TLS
    • Using node labels for placement constraints
    • Minimal access granted to operators
  • Service Deployment
    • Images are scanned for vulnerabilities
    • Content Trust is enabled
    • Services use non-root users
    • Resource constraints applied
    • Read-only filesystem where possible
    • Health checks implemented
  • Monitoring and Logging
    • Centralized logging configured
    • Security monitoring in place
    • Alerting for security events
    • Audit logging enabled
  • Maintenance Procedures
    • Update and patching process documented
    • Backup process tested
    • Disaster recovery plan validated
    • Regular security scanning scheduled
  • Security Testing
    • CIS Benchmark implemented
    • Penetration testing completed
    • Vulnerability management process in place

This guide provides a comprehensive approach to securing Docker Swarm deployments. By implementing these recommendations, organizations can significantly reduce the attack surface and improve the overall security posture of their container orchestration platform. Remember that security is a continuous process, requiring regular assessment, updates, and monitoring.

This post is licensed under CC BY 4.0 by the author.