Skip to content

Docker Deployment

Deploy the Text Classification API using Docker containers.

Prerequisites

  • Docker 20.10+
  • Docker Compose (optional)

Quick Start with Docker

1. Build the Image

cd api
docker build -t text-classifier-api .

2. Run the Container

docker run -p 8000:8000 \
  -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

3. Test the API

curl http://localhost:8000/health

Docker Compose

Create docker-compose.yml for easier deployment:

version: '3.8'
services:
  text-classifier-api:
    build: .
    ports:
      - "8000:8000"
    volumes:
      - ../final_best_model.pkl:/app/final_best_model.pkl
      - ../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl
    environment:
      - MODEL_PATH=/app/final_best_model.pkl
      - VECTORIZER_PATH=/app/tfidf_vectorizer.pkl
      - MAX_BATCH_SIZE=25
      - ENABLE_METRICS=false
    restart: unless-stopped

Run with Docker Compose:

docker-compose up -d

Environment Variables

Configure the container with environment variables:

docker run -p 8000:8000 \
  -e MODEL_PATH=/app/final_best_model.pkl \
  -e VECTORIZER_PATH=/app/tfidf_vectorizer.pkl \
  -e MAX_BATCH_SIZE=25 \
  -e ENABLE_METRICS=false \
  -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Multi-Stage Build

The Dockerfile uses multi-stage builds for optimization:

# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt

# Runtime stage
FROM python:3.11-slim as runtime
COPY --from=builder /root/.local /root/.local
COPY . .
EXPOSE 8000
CMD ["python", "main.py"]

Benefits: - Smaller final image size (~150MB vs ~500MB) - Faster deployments - Better security (no build tools in final image)

Volume Mounting

Mount model files as volumes for development:

docker run -p 8000:8000 \
  -v /path/to/model.pkl:/app/final_best_model.pkl \
  -v /path/to/vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Docker Best Practices

Security

  • Run as non-root user
  • Use specific image tags
  • Scan images for vulnerabilities
  • Keep base images updated

Performance

  • Use multi-stage builds
  • Optimize layer caching
  • Minimize image layers
  • Use appropriate base images

Development

  • Mount source code as volumes
  • Use development-specific Dockerfiles
  • Enable hot reload when possible

Troubleshooting Docker Issues

Container Won't Start

Check logs:

docker logs <container_id>

Common issues: - Model files not found - Port already in use - Insufficient memory

Performance Issues

Monitor resource usage:

docker stats <container_id>

Optimize: - Increase memory limits - Adjust CPU shares - Use faster storage

Networking Issues

Check port mapping:

docker port <container_id>

Test connectivity:

curl http://localhost:8000/health

Production Deployment

Docker Swarm

docker swarm init
docker stack deploy -c docker-compose.yml text-classifier

Kubernetes

Create deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: text-classifier-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: text-classifier-api
  template:
    metadata:
      labels:
        app: text-classifier-api
    spec:
      containers:
      - name: text-classifier-api
        image: text-classifier-api:latest
        ports:
        - containerPort: 8000
        env:
        - name: MODEL_PATH
          value: "/app/final_best_model.pkl"
        volumeMounts:
        - name: model-volume
          mountPath: /app/final_best_model.pkl
      volumes:
      - name: model-volume
        configMap:
          name: model-config

Health Checks

Docker health checks:

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

Monitoring

Container Logs

docker logs -f <container_id>

Resource Monitoring

docker stats <container_id>

Application Metrics

Access metrics endpoint:

curl http://localhost:8000/metrics

Scaling

Horizontal Scaling

docker-compose up --scale text-classifier-api=3

Load Balancing

Use nginx or traefik for load balancing:

version: '3.8'
services:
  text-classifier-api:
    # ... existing config
    deploy:
      replicas: 3

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - text-classifier-api

Backup and Recovery

Data Persistence

Use named volumes for persistent data:

volumes:
  model-data:
    driver: local

services:
  text-classifier-api:
    volumes:
      - model-data:/app/models

Backup

docker run --rm -v model-data:/data -v $(pwd):/backup alpine tar czf /backup/models.tar.gz -C /data .

Recovery

docker run --rm -v model-data:/data -v $(pwd):/backup alpine tar xzf /backup/models.tar.gz -C /data

CI/CD Integration

GitHub Actions

name: Build and Push Docker Image

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v3

    - name: Build Docker image
      run: docker build -t text-classifier-api ./api

    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag text-classifier-api ${{ secrets.DOCKER_USERNAME }}/text-classifier-api:latest
        docker push ${{ secrets.DOCKER_USERNAME }}/text-classifier-api:latest

This provides a complete Docker deployment solution with best practices for development, production, scaling, and monitoring.