Troubleshooting¶

Common issues and solutions for the Text Classification API.

Model Loading Issues¶

Error: "Model file not found"¶

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: '../final_best_model.pkl'

Solutions:

Verify file paths:

ls -la ../final_best_model.pkl
ls -la ../tfidf_vectorizer.pkl

Check environment variables:
```
echo $MODEL_PATH
echo $VECTORIZER_PATH
```

Update paths in .env file:

MODEL_PATH=../final_best_model.pkl
VECTORIZER_PATH=../tfidf_vectorizer.pkl

Error: "Model loading failed"¶

Symptoms:

ValueError: Unsupported pickle protocol

Solutions:

Check Python version compatibility:
```
python --version
```

Re-save model with compatible protocol:

import joblib
model = joblib.load('final_best_model.pkl')
joblib.dump(model, 'final_best_model.pkl', protocol=2)

Performance Issues¶

High Memory Usage¶

Symptoms: - Application crashes with out-of-memory errors - Slow response times - Health check shows high memory usage

Solutions:

Reduce batch size:
```
MAX_BATCH_SIZE=25
```
Limit text length:
```
MAX_TEXT_LENGTH=5000
```
Monitor memory usage:
```
curl http://localhost:8000/health
```

Slow Predictions¶

Symptoms: - Response times > 5 seconds - High CPU usage

Solutions:

Check batch size vs. performance:
Smaller batches may be faster for single predictions
Larger batches better for bulk processing
Optimize thread pool:
Default settings work for most cases
Adjust based on CPU cores available
Enable caching:
Ensure models are cached in memory
Check /metrics for cache hit rates

Docker Issues¶

Container Won't Start¶

Symptoms:

docker: Error response from daemon: OCI runtime create failed

Solutions:

Check Docker resources:
```
docker system info
```

Verify model files are accessible:

docker run --rm -v $(pwd):/app text-classifier-api ls -la /app

Check environment variables:

docker run --rm text-classifier-api env | grep MODEL_PATH

Port Already in Use¶

Symptoms:

docker: Error response from daemon: driver failed programming external connectivity on endpoint

Solutions:

Change host port:

docker run -p 8001:8000 text-classifier-api

Stop conflicting service:
```
sudo lsof -i :8000
sudo kill -9 <PID>
```

API Request Issues¶

Error: "Request too large"¶

Symptoms:

HTTP 413: Payload Too Large

Solutions:

Reduce batch size in request
Split large requests into smaller batches
Check MAX_BATCH_SIZE setting

Error: "Invalid input format"¶

Symptoms:

HTTP 422: Unprocessable Entity

Solutions:

Verify JSON format:

{
  "texts": ["text1", "text2"],
  "batch_size": 10
}

Check text encoding:
Ensure texts are UTF-8 encoded
Remove invalid characters
Validate text length:
Check MAX_TEXT_LENGTH setting
Truncate or split long texts

Health Check Failures¶

Error: "Health check failed"¶

Symptoms: - Health endpoint returns 503 - Application appears unresponsive

Solutions:

Check model loading:
```
curl http://localhost:8000/health
```

Verify dependencies:

python -c "import joblib, sklearn, fastapi"

Check memory usage:
Monitor system resources
Reduce batch sizes if needed

Deployment Issues¶

Free Tier Memory Limits¶

Symptoms: - Application crashes on free tier platforms - Out-of-memory errors

Solutions:

Optimize for free tier:

MAX_BATCH_SIZE=25
MAX_TEXT_LENGTH=5000
ENABLE_METRICS=false

Use multi-stage Docker build:
Ensures minimal image size
Reduces memory footprint
Monitor resource usage:
```
curl http://your-app-url/health
```

Database Connection Issues¶

Symptoms: - Metrics endpoint fails - Application starts but metrics unavailable

Solutions:

Disable metrics for free tier:
```
ENABLE_METRICS=false
```
Check database connectivity:
Verify connection strings
Ensure database is accessible

Logging and Debugging¶

Enable Debug Logging¶

DEBUG=true
LOG_LEVEL=DEBUG

View Application Logs¶

Docker:

docker logs <container_id>

Local:

python main.py 2>&1 | tee app.log

Common Log Messages¶

Message	Meaning	Action
`Model loaded successfully`	Normal startup	None
`Failed to load model`	Model file issue	Check file paths
`Memory usage high`	Performance warning	Reduce batch size
`Batch processing timeout`	Slow predictions	Optimize model or reduce batch size

Performance Monitoring¶

Key Metrics to Monitor¶

Response Time: Should be < 2 seconds for single predictions
Memory Usage: Should stay below 80% of available RAM
CPU Usage: Should not exceed 70% during normal operation
Error Rate: Should be < 1% for healthy operation

Using Metrics Endpoint¶

curl http://localhost:8000/metrics

Look for: - prediction_duration_seconds - Prediction timing - prediction_count_total - Number of predictions - memory_usage_bytes - Current memory usage

Getting Help¶

If you continue to experience issues:

Check the documentation: Review all sections of this guide
Verify your setup: Ensure all prerequisites are met
Test locally first: Run the API locally before deploying
Check platform logs: Review deployment platform logs
Open an issue: Report bugs with full error messages and configuration

Quick Diagnostic Script¶

Run this script to diagnose common issues:

#!/bin/bash
echo "=== Text Classification API Diagnostics ==="

# Check Python version
echo "Python version:"
python --version

# Check dependencies
echo "Checking dependencies..."
python -c "import joblib, sklearn, fastapi, uvicorn; print('Dependencies OK')"

# Check model files
echo "Checking model files..."
if [ -f "../final_best_model.pkl" ]; then
    echo "Model file exists"
else
    echo "ERROR: Model file not found"
fi

if [ -f "../tfidf_vectorizer.pkl" ]; then
    echo "Vectorizer file exists"
else
    echo "ERROR: Vectorizer file not found"
fi

# Check port availability
echo "Checking port 8000..."
if lsof -i :8000 > /dev/null; then
    echo "WARNING: Port 8000 is in use"
else
    echo "Port 8000 is available"
fi

echo "=== Diagnostics Complete ==="