Containerizing AI Agents for Production: From Docker to Kubernetes
Moving AI agents from local development to production requires more than docker build && docker run. Production containerization demands optimization for size, security, startup time, and resource efficiency—especially critical when dealing with large AI models and unpredictable workloads.
Optimized Docker Strategy: Multi-Stage Builds
Here’s a production-optimized Dockerfile that reduces image size while maintaining functionality:
# Build stage - heavy dependencies
FROM python:3.11-slim as builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
# Production stage - minimal runtime
FROM python:3.11-slim
WORKDIR /app
# Copy only installed packages
COPY /root/.local /root/.local
COPY src/ ./src/
COPY config/ ./config/
# Non-root user for security
RUN adduser --disabled-password --gecos '' appuser
USER appuser
# Health check for Kubernetes
HEALTHCHECK \
CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=5)"
EXPOSE 8000
CMD ["python", "-m", "src.agent_server"]Result: 73% smaller image size compared to single-stage builds, faster deployments, improved security posture.
Kubernetes Deployment Patterns
Production AI agents need sophisticated orchestration. Here’s a complete Kubernetes deployment with optimized resource allocation:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-agent-service
spec:
replicas: 2
selector:
matchLabels:
app: ai-agent
template:
metadata:
labels:
app: ai-agent
spec:
containers:
- name: ai-agent
image: your-registry/ai-agent:latest
ports:
- containerPort: 8000
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 30
env:
- name: MODEL_CACHE_DIR
value: "/tmp/models"
volumeMounts:
- name: model-cache
mountPath: /tmp/models
volumes:
- name: model-cache
emptyDir:
sizeLimit: 5Gi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-agent-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-agent-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Infrastructure as Code: Terraform Modules
Deploy consistent clusters across AWS and GCP:
# modules/eks-cluster/main.tf
resource "aws_eks_cluster" "ai_agents" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
version = "1.28"
vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = true
endpoint_public_access = true
}
}
resource "aws_eks_node_group" "ai_agents" {
cluster_name = aws_eks_cluster.ai_agents.name
node_group_name = "ai-agents-nodes"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids
instance_types = ["m5.large", "m5.xlarge"]
capacity_type = "SPOT" # 60-70% cost savings
scaling_config {
desired_size = 2
max_size = 8
min_size = 1
}
}Cost Comparison: Managed vs Self-Managed
| Configuration | AWS EKS | GCP GKE | Monthly Cost (3 nodes) |
|---|---|---|---|
| Managed + On-Demand | $216 | $180 | Higher reliability |
| Managed + Spot | $95 | $78 | 60-70% savings |
| Self-Managed | $180 | $144 | Operational overhead |
Recommendation: Managed Kubernetes with spot instances provides optimal cost-performance balance for most AI agent workloads.
Production Considerations
- Security: Non-root containers, minimal base images, vulnerability scanning
- Monitoring: Application metrics, resource utilization, custom AI-specific metrics
- Scaling: HPA configuration for variable AI workloads
- Reliability: Health checks, graceful shutdowns, circuit breakers
- Cost Control: Resource limits, spot instances, automatic scaling policies
Next Steps
Your containerized AI agents are now production-ready. The next article explores serverless deployment patterns—when containers might be overkill and how to optimize for cost and cold start performance.
Start simple, measure everything, and evolve toward the complexity your use case demands.