Docker Deployment¶

Deploy the Text Classification API using Docker containers.

Prerequisites¶

Docker 20.10+
Docker Compose (optional)

Quick Start with Docker¶

1. Build the Image¶

cd api
docker build -t text-classifier-api .

2. Run the Container¶

docker run -p 8000:8000 \
  -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

3. Test the API¶

curl http://localhost:8000/health

Docker Compose¶

Create docker-compose.yml for easier deployment:

version: '3.8'
services:
  text-classifier-api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ../final_best_model.pkl:/app/final_best_model.pkl
      - ../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl
    environment:
      - MODEL_PATH=/app/final_best_model.pkl
      - VECTORIZER_PATH=/app/tfidf_vectorizer.pkl
      - MAX_BATCH_SIZE=25
      - ENABLE_METRICS=false
    restart: unless-stopped

Run with Docker Compose:

docker-compose up -d

Environment Variables¶

Configure the container with environment variables:

docker run -p 8000:8000 \
  -e MODEL_PATH=/app/final_best_model.pkl \
  -e VECTORIZER_PATH=/app/tfidf_vectorizer.pkl \
  -e MAX_BATCH_SIZE=25 \
  -e ENABLE_METRICS=false \
  -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Multi-Stage Build¶

The Dockerfile uses multi-stage builds for optimization:

# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

# Runtime stage
FROM python:3.11-slim as runtime
COPY --from=builder /root/.local /root/.local
COPY . .
EXPOSE 8000
CMD ["python", "main.py"]

Benefits: - Smaller final image size (~150MB vs ~500MB) - Faster deployments - Better security (no build tools in final image)

Volume Mounting¶

Mount model files as volumes for development:

docker run -p 8000:8000 \
  -v /path/to/model.pkl:/app/final_best_model.pkl \
  -v /path/to/vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Docker Best Practices¶

Security¶

Run as non-root user
Use specific image tags
Scan images for vulnerabilities
Keep base images updated

Performance¶

Use multi-stage builds
Optimize layer caching
Minimize image layers
Use appropriate base images

Development¶

Mount source code as volumes
Use development-specific Dockerfiles
Enable hot reload when possible

Troubleshooting Docker Issues¶

Container Won't Start¶

Check logs:

docker logs <container_id>

Common issues: - Model files not found - Port already in use - Insufficient memory

Performance Issues¶

Monitor resource usage:

docker stats <container_id>

Optimize: - Increase memory limits - Adjust CPU shares - Use faster storage

Networking Issues¶

Check port mapping:

docker port <container_id>

Test connectivity:

curl http://localhost:8000/health

Production Deployment¶

Docker Swarm¶

docker swarm init
docker stack deploy -c docker-compose.yml text-classifier

Kubernetes¶

Create deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: text-classifier-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: text-classifier-api
  template:
    metadata:
      labels:
        app: text-classifier-api
    spec:
      containers:
      - name: text-classifier-api
        image: text-classifier-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: MODEL_PATH
          value: "/app/final_best_model.pkl"
        volumeMounts:
        - name: model-volume
          mountPath: /app/final_best_model.pkl
      volumes:
      - name: model-volume
        configMap:
          name: model-config

Health Checks¶

Docker health checks:

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

Monitoring¶

Container Logs¶

docker logs -f <container_id>

Resource Monitoring¶

docker stats <container_id>

Application Metrics¶

Access metrics endpoint:

curl http://localhost:8000/metrics

Scaling¶

Horizontal Scaling¶

docker-compose up --scale text-classifier-api=3

Load Balancing¶

Use nginx or traefik for load balancing:

version: '3.8'
services:
  text-classifier-api:
    # ... existing config
    deploy:
      replicas: 3

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - text-classifier-api

Backup and Recovery¶

Data Persistence¶

Use named volumes for persistent data:

volumes:
  model-data:
    driver: local

services:
  text-classifier-api:
    volumes:
      - model-data:/app/models

Backup¶

docker run --rm -v model-data:/data -v $(pwd):/backup alpine tar czf /backup/models.tar.gz -C /data .

Recovery¶

docker run --rm -v model-data:/data -v $(pwd):/backup alpine tar xzf /backup/models.tar.gz -C /data

CI/CD Integration¶

GitHub Actions¶

name: Build and Push Docker Image

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Build Docker image
      run: docker build -t text-classifier-api ./api

    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag text-classifier-api ${{ secrets.DOCKER_USERNAME }}/text-classifier-api:latest
        docker push ${{ secrets.DOCKER_USERNAME }}/text-classifier-api:latest

This provides a complete Docker deployment solution with best practices for development, production, scaling, and monitoring.