Docker Swarm Container Orchestration Guide
Comprehensive guide to Docker Swarm: initialize clusters, deploy and manage services, configure overlay networks, scale applications, implement security best practices, and monitor clusters.
Comprehensive Security Guide for Docker Swarm
1. Purpose and Overview
Docker Swarm is Docker’s native clustering and orchestration solution for Docker containers. While Kubernetes has gained more popularity in recent years, Docker Swarm remains relevant for organizations seeking a lighter-weight orchestration system integrated directly with the Docker engine.
This guide focuses specifically on security considerations, hardening techniques, and best practices for deploying and managing Docker Swarm in production environments. We’ll cover everything from secure initialization to ongoing operational security, with special attention to defense-in-depth approaches.
2. Table of Contents
- 1. Purpose and Overview
- 2. Table of Contents
- 3. Docker Swarm Security Architecture
- 4. Secure Swarm Initialization
- 5. Network Security
- 6. Node Hardening
- 7. Secret Management
- 8. Access Controls
- 9. Secure Service Deployment
- 10. Logging and Monitoring
- 11. Backup and Recovery
- 12. Secure Updates and Patching
- 13. Security Testing
- 14. References and Further Reading
- 15. Appendices
3. Docker Swarm Security Architecture
3.1 Swarm Mode Security Features
Docker Swarm mode incorporates several built-in security features that provide a solid foundation for building secure container orchestration:
- Automated TLS: Swarm mode automatically creates a self-signed CA, generates and distributes certificates to all nodes.
- Certificate Rotation: TLS certificates used in Swarm are automatically rotated.
- Encrypted Cluster Store: The Raft consensus store is encrypted by default.
- Encrypted Join Tokens: Different tokens for workers and managers help maintain separation of privileges.
- Mutual TLS Authentication: All control plane communication is protected with mutual TLS, ensuring both client and server authenticate each other.
These features provide defense-in-depth but must be complemented with proper operational security practices.
3.2 Security Considerations by Component
Component | Security Considerations |
---|---|
Manager Nodes | Most sensitive components; compromise means full control of cluster |
Worker Nodes | Reduced privilege; still provide execution environment |
Control Plane | TLS securing all communications |
Data Plane | Opt-in encryption for overlay networks |
Docker Engine | Root-level access on hosts; container isolation |
API and CLI | Authentication and authorization concerns |
4. Secure Swarm Initialization
4.1 Pre-Initialization Security Checklist
Before initializing your Swarm, ensure:
- ✅ Host systems are fully patched
- ✅ Docker Engine is updated to latest stable version
- ✅ Default user accounts are secured (not using default passwords)
- ✅ Firewall rules are configured for Swarm ports only
- ✅ Docker daemon configurations are secured (see section 6.1)
- ✅ SELinux/AppArmor is properly configured
- ✅ Disk encryption is implemented for sensitive data
- ✅ Network segmentation is properly configured
- ✅ NTP is configured for time synchronization
4.2 Securing Manager Nodes
Manager nodes are the most critical components in your Swarm architecture. Compromise of manager nodes can lead to full cluster compromise.
1
2
# Initialize the swarm with explicit advertise address to control network exposure
docker swarm init --advertise-addr <MANAGER-IP> --autolock
The --autolock
flag is crucial for security. It encrypts the Raft logs and requires a key to unlock the Swarm after restarts, providing protection against data extraction from disk.
Store the unlock key securely outside the Swarm (like in a password manager or HSM):
1
2
3
To unlock the swarm use the following key:
SWMKEY-1-5ZwhXs9trhfBzwL0zYJDX1Oon3jz1U2AvdASNzQ+vME
Additional manager hardening steps:
- Implement separate management network for control plane traffic
- Restrict physical and SSH access to manager nodes
- Use dedicated nodes for management (not running other workloads)
- Deploy an odd number of managers (3, 5, 7) distributed across availability zones
- Use CPU/memory resource limits to prevent DoS conditions
4.3 Joining Worker Nodes Securely
Worker nodes should be joined using the worker token, never the manager token:
1
2
3
4
5
# Get the worker join token from a manager node
docker swarm join-token worker
# Join as a worker with the token
docker swarm join --token SWMTKN-1-49nj1cmql0... <MANAGER-IP>:2377
Security considerations:
- Always use specific IP addresses, not 0.0.0.0
- Rotate join tokens regularly
- Clear Docker logs containing join tokens
- Implement Just-In-Time (JIT) node provisioning using automation
4.4 Verifying Swarm Integrity
After initialization and after any major changes, verify the integrity of your Swarm:
1
2
3
4
5
6
7
8
# List and verify all nodes and their roles
docker node ls
# Check the status of the swarm
docker info | grep -A 10 Swarm
# Verify TLS configuration
docker info | grep -A 5 "Security Options"
5. Network Security
5.1 Control Plane Security
Docker Swarm requires the following ports for control plane traffic:
- TCP port 2377 for cluster management
- TCP and UDP port 7946 for node-to-node communication
- UDP port 4789 for overlay network traffic
Secure these with strict firewall rules:
1
2
3
4
5
# Example UFW rules
sudo ufw allow 2377/tcp
sudo ufw allow 7946/tcp
sudo ufw allow 7946/udp
sudo ufw allow 4789/udp
Iptables example:
1
2
3
4
5
# Allow Swarm management traffic only from trusted IPs
iptables -A INPUT -p tcp -s 10.10.0.0/24 --dport 2377 -j ACCEPT
# Default deny for swarm management port
iptables -A INPUT -p tcp --dport 2377 -j DROP
5.2 Data Plane Security
For container-to-container communication, implement these security controls:
- Network segmentation using overlay networks
- Network policies to control traffic flow
- Service isolation with dedicated overlay networks
Example of creating an isolated overlay network:
1
2
3
4
5
# Create an isolated overlay network
docker network create --driver overlay --attachable --opt encrypted=true isolated_network
# Deploy service on isolated network
docker service create --name secure-app --network isolated_network my-image
5.3 Overlay Network Encryption
Encrypt overlay networks to protect data in transit between containers:
1
2
3
4
5
# Create encrypted overlay network
docker network create --driver overlay --opt encrypted=true secure_overlay
# Verify encryption is enabled
docker network inspect secure_overlay | grep -A 3 Options
Note that encryption adds overhead, so benchmark performance impact before deploying broadly.
5.4 Ingress Network Configuration
The ingress network handles external traffic to published service ports. Secure it with:
1
2
3
4
5
6
7
8
9
# Remove the default ingress network (caution: disrupts running services)
docker network rm ingress
# Create a custom ingress network with encryption
docker network create \
--driver overlay \
--ingress \
--opt encrypted=true \
new-ingress
6. Node Hardening
6.1 Docker Daemon Security
Secure the Docker daemon with a proper configuration file (/etc/docker/daemon.json
):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"icc": false,
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"userns-remap": "default",
"live-restore": true,
"userland-proxy": false,
"no-new-privileges": true,
"seccomp-profile": "/etc/docker/seccomp-profile.json",
"default-ulimits": {
"nofile": {
"Name": "nofile",
"Hard": 64000,
"Soft": 64000
}
},
"selinux-enabled": true,
"experimental": false
}
Key security settings:
icc: false
- Disables inter-container communicationuserns-remap
- Enables user namespace isolationno-new-privileges
- Prevents privilege escalationseccomp-profile
- Applies syscall filteringselinux-enabled
- Enables SELinux security
6.2 Host OS Security
Apply these host security hardening measures:
- Minimize installed packages (use minimal OS images)
- Implement regular patching schedule
- Enable and configure host-based firewall
- Use SELinux/AppArmor in enforcing mode
- Implement file integrity monitoring
- Configure strong SSH authentication (keys only, no passwords)
Example of configuring auditd for Docker:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Add Docker daemon audit rules
cat << EOF > /etc/audit/rules.d/docker.rules
-w /usr/bin/docker -p wa
-w /var/lib/docker -p wa
-w /etc/docker -p wa
-w /lib/systemd/system/docker.service -p wa
-w /lib/systemd/system/docker.socket -p wa
-w /etc/default/docker -p wa
-w /etc/docker/daemon.json -p wa
-w /usr/bin/docker-containerd -p wa
-w /usr/bin/docker-runc -p wa
EOF
# Restart auditd
service auditd restart
6.3 Container Isolation
Enhance container isolation beyond the defaults:
- Use read-only filesystems where possible
- Apply appropriate Linux capabilities
- Implement security profiles (seccomp, AppArmor)
Example service with enhanced security:
1
2
3
4
5
6
7
8
9
10
docker service create \
--name secure-service \
--read-only \
--mount type=tmpfs,destination=/tmp \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \
--security-opt seccomp=/etc/docker/seccomp-custom.json \
--security-opt apparmor=docker-default \
--security-opt no-new-privileges \
nginx:alpine
7. Secret Management
7.1 Using Docker Secrets
Docker Swarm provides a native secrets management system:
1
2
3
4
5
6
7
8
9
# Create a secret
echo "secure_password" | docker secret create db_password -
# Use the secret in a service
docker service create \
--name db \
--secret db_password \
--env DB_PASSWORD_FILE=/run/secrets/db_password \
postgres
Secrets best practices:
- Never expose secrets in service definitions or environment variables
- Limit secret access to specific services
- Implement secret rotation (see 7.3)
- Avoid third-party images that don’t handle secrets properly
7.2 External Secret Management Integration
For more robust secret management, integrate with external systems:
HashiCorp Vault Integration Example:
- Deploy Vault agent in Swarm
1
2
3
4
5
docker service create \
--name vault-agent \
--network control-plane-network \
--mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
vault-agent-image
- Create template for retrieving secrets:
1
2
3
4
template {
source = "/etc/vault-agent/templates/db-creds.tpl"
destination = "/run/secrets/db-credentials"
}
7.3 Secrets Rotation
Implement a secure rotation strategy for secrets:
1
2
3
4
5
6
7
8
9
10
11
# Create new secret version
echo "new_secure_password" | docker secret create db_password_v2 -
# Update service to use new secret
docker service update \
--secret-rm db_password \
--secret-add db_password_v2 \
db
# Remove old secret after confirming service is working
docker secret rm db_password
Automation script example for secret rotation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#!/bin/bash
# Secret rotation script
# Generate new password
NEW_PASSWORD=$(openssl rand -base64 32)
# Create new secret
echo $NEW_PASSWORD | docker secret create ${SECRET_NAME}_new -
# Update service
docker service update --secret-rm $SECRET_NAME --secret-add source=${SECRET_NAME}_new,target=$SECRET_NAME $SERVICE_NAME
# Verify service health
sleep 30
if docker service ls | grep $SERVICE_NAME | grep -q "0/"; then
echo "Service update failed, rolling back"
docker service update --secret-rm ${SECRET_NAME}_new --secret-add $SECRET_NAME $SERVICE_NAME
exit 1
fi
# Remove old secret
docker secret rm $SECRET_NAME
# Rename new secret to standard name
docker secret create $SECRET_NAME - < <(docker exec $(docker ps -q -f name=$SERVICE_NAME) cat /run/secrets/${SECRET_NAME}_new)
docker service update --secret-rm ${SECRET_NAME}_new --secret-add $SECRET_NAME $SERVICE_NAME
docker secret rm ${SECRET_NAME}_new
8. Access Controls
8.1 Role-Based Access Control
Docker Enterprise Edition offers RBAC, but for standard Docker Swarm, implement controls with:
- Separate management accounts from service accounts
- Use team-based access via Unix groups
- Implement sudo with limited commands for operators
Example sudoers configuration:
1
2
3
4
5
# Allow swarm operators to run specific docker commands
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker node ls
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service ls
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service logs
%swarm-operators ALL=(root) NOPASSWD: /usr/bin/docker service inspect
8.2 Label-Based Controls
Use node labels to control workload placement:
1
2
3
4
5
6
7
8
# Add security-level label to node
docker node update --label-add security=high node-1
# Deploy service only to high-security nodes
docker service create \
--name secure-backend \
--constraint node.labels.security==high \
backend-image
8.3 API Access Controls
Secure the Docker API:
1
2
3
4
5
6
# Configure TLS for Docker API
mkdir -p /etc/docker/ssl
# Generate certs (use proper CA process in production)
openssl req -x509 -nodes -days 365 -newkey rsa:4096 \
-keyout /etc/docker/ssl/server-key.pem \
-out /etc/docker/ssl/server-cert.pem
Update daemon.json:
1
2
3
4
5
6
7
8
{
"tls": true,
"tlsverify": true,
"tlscacert": "/etc/docker/ssl/ca.pem",
"tlscert": "/etc/docker/ssl/server-cert.pem",
"tlskey": "/etc/docker/ssl/server-key.pem",
"hosts": ["unix:///var/run/docker.sock", "tcp://0.0.0.0:2376"]
}
9. Secure Service Deployment
9.1 Image Security
Implement a secure container image strategy:
- Use minimal base images (Alpine, distroless)
- Scan images for vulnerabilities before deployment
- Sign images with Docker Content Trust
Enable Docker Content Trust:
1
2
3
4
5
6
# Enable signing for push/pull operations
export DOCKER_CONTENT_TRUST=1
# Sign and push an image
docker tag myapp:latest myregistry.example.com/myapp:latest
docker push myregistry.example.com/myapp:latest
Configure a secure registry in daemon.json:
1
2
3
4
5
6
7
8
9
{
"registry-mirrors": ["https://secure-registry.example.com"],
"insecure-registries": [],
"content-trust": {
"trust-pinning": {
"official": ["docker.io"]
}
}
}
9.2 Service Configuration Security
Deploy services with security-first configurations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
docker service create \
--name web-frontend \
--read-only \
--user nobody:nogroup \
--limit-cpu 0.5 \
--limit-memory 512M \
--reserve-cpu 0.1 \
--reserve-memory 128M \
--restart-condition on-failure \
--restart-max-attempts 5 \
--update-delay 10s \
--update-parallelism 1 \
--update-failure-action rollback \
--health-cmd "curl -f http://localhost/ || exit 1" \
--health-interval 5s \
--health-retries 3 \
--health-timeout 2s \
--network secure_frontend \
nginx:alpine
9.3 Resource Constraints
Implement resource constraints to prevent DoS conditions:
1
2
3
4
5
6
7
8
9
docker service create \
--name resource-limited-app \
--limit-cpu 0.25 \
--limit-memory 256M \
--reserve-cpu 0.1 \
--reserve-memory 128M \
--ulimit nofile=65536:65536 \
--ulimit nproc=1024:1024 \
myapp
9.4 Health Monitoring
Implement comprehensive health checks:
1
2
3
4
5
6
7
8
docker service create \
--name monitored-app \
--health-cmd "curl -f http://localhost:8080/health || exit 1" \
--health-interval 15s \
--health-timeout 5s \
--health-retries 3 \
--health-start-period 30s \
myapp
10. Logging and Monitoring
10.1 Centralized Logging
Configure Docker logging to send to a central system:
1
2
3
4
5
6
7
8
9
// In daemon.json
{
"log-driver": "syslog",
"log-opts": {
"syslog-address": "udp://log-aggregator.example.com:514",
"syslog-facility": "daemon",
"tag": "{{.ImageName}}/{{.Name}}/{{.ID}}"
}
}
Service-specific logging:
1
2
3
4
5
6
docker service create \
--name app-with-logging \
--log-driver=fluentd \
--log-opt fluentd-address=fluentd-aggregator.example.com:24224 \
--log-opt tag="{{.Name}}.{{.ID}}" \
myapp
10.2 Security Monitoring
Implement active security monitoring:
- Container runtime monitoring (Falco)
- Network traffic analysis
- Host-based intrusion detection
- API call auditing
Example Falco rule for Docker Swarm:
1
2
3
4
5
6
7
8
9
10
11
12
- rule: Unauthorized Docker Swarm Access
desc: Detects unauthorized access to Docker Swarm API
condition: >
spawned_process and
(proc.name = "curl" or proc.name = "wget") and
(proc.cmdline contains "docker" and proc.cmdline contains "2377") and
not user.name in (docker_users_list)
output: >
Unauthorized Docker Swarm API access attempt
(user=%user.name command=%proc.cmdline)
priority: WARNING
tags: [process, mitre_discovery]
10.3 Alerting
Implement security alerting for critical events:
- Manager node changes
- Certificate rotation events
- Secret access
- Unauthorized API access attempts
Example Docker API monitoring script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
# Docker Swarm API monitoring
LOG_FILE="/var/log/docker-api.log"
ALERT_SCRIPT="/usr/local/bin/send-security-alert.sh"
# Tail Docker daemon logs and look for API access
journalctl -fu docker | while read line; do
if echo "$line" | grep -q "API access"; then
echo "$(date) - $line" >> $LOG_FILE
# Check if it's an unauthorized access
if echo "$line" | grep -q "unauthorized" || echo "$line" | grep -q "permission denied"; then
$ALERT_SCRIPT "Unauthorized Docker API access: $line"
fi
fi
done
11. Backup and Recovery
11.1 Swarm State Backup
Implement regular backups of Swarm state:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash
# Swarm backup script
BACKUP_DIR="/var/backups/swarm"
BACKUP_FILE="$BACKUP_DIR/swarm-$(date +%Y%m%d%H%M).tar.gz"
SWARM_DIR="/var/lib/docker/swarm"
# Stop Docker
systemctl stop docker
# Backup Swarm directory
tar -czf $BACKUP_FILE $SWARM_DIR
# Start Docker
systemctl start docker
# Encrypt backup
gpg --encrypt --recipient [email protected] $BACKUP_FILE
# Remove unencrypted backup
rm $BACKUP_FILE
# Verify Docker Swarm is healthy
if ! docker node ls &> /dev/null; then
echo "WARNING: Swarm not functioning after backup!"
# Send alert
/usr/local/bin/send-alert.sh "Swarm backup failure, manual intervention required"
fi
11.2 Disaster Recovery Planning
Create a comprehensive DR plan:
- Document recovery procedures:
1
2
3
4
5
6
7
8
9
10
11
12
# Docker Swarm Recovery Procedure
## Prerequisites
- Backup file location: /backup/swarm-backup.tar.gz
- Manager node hostname: swarm-manager-01
- Manager IP: 10.0.1.10
## Recovery Steps
1. Install Docker on the new manager node
2. Stop Docker: `systemctl stop docker`
3. Restore the swarm directory:
mkdir -p /var/lib/docker tar -xzf /backup/swarm-backup.tar.gz -C /
1
2
3
4. Start Docker: `systemctl start docker`
5. Verify the swarm is restored: `docker node ls`
6. If recovery fails, initialize a new swarm and restore services:
docker swarm init –advertise-addr 10.0.1.10 –force-new-cluster
1
7. Apply service restore from config backup
- Regularly test recovery procedures
- Document recovery time objectives (RTOs)
- Maintain offline copies of critical configs
12. Secure Updates and Patching
12.1 Node Updates
Implement a node update strategy:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#!/bin/bash
# Secure Swarm node update script
NODE=$1
if [ -z "$NODE" ]; then
echo "Usage: $0 <node-hostname>"
exit 1
fi
# Step 1: Set node to drain state
docker node update --availability drain $NODE
# Step 2: Wait for containers to drain
echo "Waiting for containers to drain..."
while docker node ps $NODE | grep -q -v "Shutdown" | grep -q -v "ID"; do
sleep 5
done
# Step 3: Update packages
ssh $NODE "apt-get update && apt-get upgrade -y"
# Step 4: Check for Docker updates
ssh $NODE "apt-get install -y docker-ce"
# Step 5: Reboot if kernel was updated
if ssh $NODE "[ -f /var/run/reboot-required ]"; then
echo "Rebooting node..."
ssh $NODE "reboot"
sleep 60 # Wait for reboot
fi
# Step 6: Verify Docker is running
until ssh $NODE "docker info &>/dev/null"; do
echo "Waiting for Docker to start..."
sleep 5
done
# Step 7: Set node back to active
docker node update --availability active $NODE
# Step 8: Verify node is back
echo "Node update complete. Current status:"
docker node inspect $NODE --format '{{ .Status.State }}'
12.2 Container Image Updates
Implement a secure image update process:
1
2
3
4
5
6
7
8
# Update a service with a new image version
docker service update \
--image myapp:1.2.1 \
--update-parallelism 1 \
--update-delay 30s \
--update-failure-action rollback \
--update-order start-first \
myapp-service
Automated vulnerability patching:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/bin/bash
# Auto-update vulnerable images
# Get list of running services
SERVICES=$(docker service ls --format "{{.Name}}")
for SERVICE in $SERVICES; do
# Get current image
CURRENT_IMAGE=$(docker service inspect $SERVICE --format "{{.Spec.TaskTemplate.ContainerSpec.Image}}")
# Get image without digest/tag
BASE_IMAGE=$(echo $CURRENT_IMAGE | cut -d '@' -f 1 | cut -d ':' -f 1)
# Check if newer version exists
LATEST_DIGEST=$(docker pull $BASE_IMAGE:latest | grep "Digest:" | cut -d ' ' -f 2)
CURRENT_DIGEST=$(echo $CURRENT_IMAGE | grep -o '@.*' || echo "")
if [ "@$LATEST_DIGEST" != "$CURRENT_DIGEST" ]; then
echo "Updating $SERVICE to latest secure version"
# Update with security settings maintained
docker service update \
--image $BASE_IMAGE:latest \
--update-parallelism 1 \
--update-delay 30s \
--update-failure-action rollback \
$SERVICE
fi
done
12.3 Docker Engine Updates
Create an update plan for Docker Engine:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Docker Engine Update Procedure
1. Update manager nodes one at a time:
- Set the target node to drain mode
- Wait for tasks to be rescheduled
- Update Docker Engine packages
- Restart the Docker daemon
- Set node back to active
- Verify swarm status before proceeding to next node
2. Update worker nodes in batches (max 30% at a time):
- Drain a batch of nodes
- Update Docker Engine
- Restart nodes
- Verify nodes reconnect to swarm
- Set nodes to active state
- Proceed to next batch
3. Post-update verification:
- Check all services are running correct replica count
- Validate overlay network connectivity
- Test service discovery
- Verify secret access
13. Security Testing
13.1 Penetration Testing
Implement regular security testing:
- Periodic penetration tests of Swarm infrastructure
- Attack simulations for common threat vectors:
- Container breakout attempts
- Unauthorized API access
- Control plane compromise attempts
Example testing approach:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Docker Swarm Penetration Testing Checklist
## Network Testing
- Port scan for open Docker ports (2375, 2376, 2377, 7946, 4789)
- TLS certificate validation
- Man-in-the-middle attack attempts against control plane
- Traffic sniffing on overlay networks
## API Security
- Unauthenticated API access attempts
- Authentication bypass tests
- Authorization tests for privileged operations
## Node Security
- Container breakout attempts
- Privilege escalation within containers
- Access to host resources from containers
- Docker socket mounting tests
## Secret Management
- Attempt to extract secrets from containers
- Test secret rotation procedures
- Verify proper secret isolation
## Documentation
- Document all findings
- Rate vulnerabilities by severity
- Provide remediation steps
13.2 Vulnerability Scanning
Implement container vulnerability scanning:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Example using Trivy scanner
#!/bin/bash
SERVICES=$(docker service ls --format "{{.Name}}")
for SERVICE in $SERVICES; do
IMAGE=$(docker service inspect $SERVICE --format "{{.Spec.TaskTemplate.ContainerSpec.Image}}")
echo "Scanning $SERVICE ($IMAGE)"
# Run vulnerability scan
trivy image $IMAGE > /var/log/security/trivy-$SERVICE.log
# Check for critical vulnerabilities
if grep -q "CRITICAL: [1-9]" /var/log/security/trivy-$SERVICE.log; then
echo "CRITICAL vulnerabilities found in $SERVICE!"
# Send alert
./send-security-alert.sh "Critical vulnerabilities in $SERVICE: $IMAGE"
fi
done
Integrate scanning into CI/CD pipeline:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Example GitLab CI configuration
stages:
- build
- scan
- deploy
build:
stage: build
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
security_scan:
stage: scan
script:
- trivy image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
- |
if trivy image --exit-code 1 --severity CRITICAL $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA; then
echo "No critical vulnerabilities found"
else
echo "Critical vulnerabilities found - failing build"
exit 1
fi
deploy:
stage: deploy
script:
- docker service update --image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA my-service
only:
- master
13.3 Security Benchmarks
Implement Docker security benchmarks based on CIS guidelines:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash
# CIS Docker Benchmark tester
# Install Docker Bench for Security
git clone https://github.com/docker/docker-bench-security.git
cd docker-bench-security
# Run the benchmark
./docker-bench-security.sh
# Check for failed tests
grep "\[WARN\]" docker-bench-security.log | tee security-warnings.txt
grep "\[FAIL\]" docker-bench-security.log | tee security-failures.txt
# Generate remediation report
echo "# Docker Security Remediation Report" > remediation.md
echo "Generated: $(date)" >> remediation.md
echo "" >> remediation.md
echo "## Failed Checks" >> remediation.md
grep "\[FAIL\]" docker-bench-security.log >> remediation.md
echo "" >> remediation.md
echo "## Warning Checks" >> remediation.md
grep "\[WARN\]" docker-bench-security.log >> remediation.md
Automate with scheduled jobs:
1
2
3
# /etc/cron.d/docker-security
# Run Docker security benchmark weekly
0 0 * * 0 root /usr/local/bin/docker-bench-security.sh > /var/log/docker-bench-results.log 2>&1
14. References and Further Reading
Official Documentation
Security Resources
Related CVEs
CVE ID | Description | Affected Versions | Remediation |
---|---|---|---|
CVE-2021-41091 | Volume permission race condition | Docker < 20.10.9 | Upgrade to Docker >= 20.10.9 |
CVE-2021-21285 | Symlink-following vulnerability | Docker < 20.10.3 | Upgrade to Docker >= 20.10.3 |
CVE-2019-14271 | Container breakout with subuid mounting | Docker < 19.03.1 | Upgrade to Docker >= 19.03.1 |
CVE-2019-5736 | runc container breakout | Docker < 18.09.2 | Upgrade to Docker >= 18.09.2 |
Blogs and Articles
- “Docker Swarm: Hardening Best Practices” - SecureWorks
- “The Path Less Traveled: Abusing Docker Swarm” - BlackHat 2020
- “Container Security in Swarm Mode” - Docker Blog
15. Appendices
15.1 Common Misconfigurations
Misconfiguration | Risk | Remediation |
---|---|---|
Exposing Docker API without TLS | Remote compromise | Enable TLS authentication |
Running containers as root | Privilege escalation | Use USER directive in Dockerfile |
Mounting Docker socket | Container breakout | Avoid socket mounting, use API proxy |
Using default bridge network | No network isolation | Use custom overlay networks |
Unrestricted resource consumption | DoS conditions | Set resource constraints |
Deploying unscanned images | Vulnerable software | Implement image scanning |
Not enabling content trust | Image tampering | Enable Docker Content Trust |
Using latest tags | Unpredictable updates | Use specific version tags |
15.2 Troubleshooting Security Issues
Certificate Problems
If you encounter TLS/certificate issues:
1
2
3
4
5
6
7
8
# Check certificate expiration
openssl x509 -in /var/lib/docker/swarm/certificates/swarm-node.crt -text -noout | grep "Not After"
# Force certificate rotation
docker swarm ca --rotate
# Verify certificate rotation
docker system info | grep -A 5 "CA Configuration"
Node Communication Issues
For control plane communication problems:
1
2
3
4
5
6
7
8
9
# Test control plane connectivity
for port in 2377 7946; do
for proto in tcp udp; do
nc -zv manager-node $port $proto
done
done
# Check mutual TLS configuration
docker system info | grep -A 5 "Security Options"
Secret Access Issues
When services can’t access secrets:
1
2
3
4
5
6
7
8
# Check if secret exists
docker secret ls | grep my-secret
# Verify service has secret attached
docker service inspect my-service --format "{{.Spec.TaskTemplate.ContainerSpec.Secrets}}"
# Check container can read secret
docker exec $(docker ps -q -f name=my-service) ls -la /run/secrets/
15.3 Security-Related Docker Commands
Quick reference guide for security operations:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Swarm Encryption Commands
## Enable autolock for an existing swarm
docker swarm update --autolock=true
## Get the unlock key
docker swarm unlock-key
## Rotate the unlock key
docker swarm unlock-key --rotate
## Unlock a swarm after restart
docker swarm unlock
# Certificate Commands
## Rotate swarm certificates
docker swarm ca --rotate
## View certificate validity
docker system info
# Security Inspection Commands
## Check service security options
docker service inspect --format "{{.Spec.TaskTemplate.ContainerSpec.Privileges}}" my-service
## List all secret usage
docker service ls --format "{{.Name}}" | xargs -I{} docker service inspect {} --format "{{.Name}}: {{.Spec.TaskTemplate.ContainerSpec.Secrets}}"
## Check network encryption
docker network ls --format "{{.Name}}" | xargs -I{} docker network inspect {} --format "{{.Name}}: {{.Options}}"
# Access Control Commands
## Add a label to control placement
docker node update --label-add security=high node-1
## View node labels
docker node inspect --format "{{.Spec.Labels}}" node-1
## Deploy with security constraints
docker service create --name secure-service --constraint node.labels.security==high nginx:alpine
16. Security Checklist
Use this checklist to ensure your Docker Swarm deployment follows security best practices:
- Pre-Deployment Security
- Host OS is minimal and hardened
- Docker Engine is updated to latest stable version
- Daemon configuration is security-optimized
- Network segmentation is implemented
- Firewall rules are in place for Swarm ports
- Swarm Initialization
- Used
--autolock
flag for encrypted Raft logs - Store unlock key securely
- Used specific IP addresses, not 0.0.0.0
- Deployed odd number of managers (3 or 5)
- Manager nodes are dedicated (no other workloads)
- Used
- Network Security
- Overlay networks have encryption enabled
- Control plane firewall rules in place
- Ingress network is secured
- Network policies implemented for service isolation
- Secret Management
- Using Docker Secrets for sensitive data
- No secrets in environment variables
- Secret rotation process is documented
- Limited secret access to required services only
- Access Controls
- Role-based access implemented
- API endpoint secured with TLS
- Using node labels for placement constraints
- Minimal access granted to operators
- Service Deployment
- Images are scanned for vulnerabilities
- Content Trust is enabled
- Services use non-root users
- Resource constraints applied
- Read-only filesystem where possible
- Health checks implemented
- Monitoring and Logging
- Centralized logging configured
- Security monitoring in place
- Alerting for security events
- Audit logging enabled
- Maintenance Procedures
- Update and patching process documented
- Backup process tested
- Disaster recovery plan validated
- Regular security scanning scheduled
- Security Testing
- CIS Benchmark implemented
- Penetration testing completed
- Vulnerability management process in place
This guide provides a comprehensive approach to securing Docker Swarm deployments. By implementing these recommendations, organizations can significantly reduce the attack surface and improve the overall security posture of their container orchestration platform. Remember that security is a continuous process, requiring regular assessment, updates, and monitoring.