Installation¶

Step-by-step guide to install and set up the Text Classification API.

Prerequisites¶

Python 3.11 or higher
pip package manager
Git (for cloning repository)
2GB+ available RAM
1GB+ available disk space

Local Installation¶

1. Clone Repository¶

git clone https://github.com/yourusername/text-classifier-api.git
cd text-classifier-api

2. Install Dependencies¶

# Navigate to API directory
cd api

# Install Python packages
pip install -r api_requirements.txt

3. Download NLTK Data¶

# Run Python to download required NLTK data
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"

4. Prepare Model Files¶

# Copy model files to root directory (from your training)
cp ../final_best_model.pkl .
cp ../tfidf_vectorizer.pkl .

Docker Installation¶

Using Docker Compose¶

# Build and run
docker-compose -f api/docker-compose.yml up --build

Using Docker Directly¶

# Build image
docker build -t text-classifier-api ./api

# Run container
docker run -p 8000:8000 \
  -v $(pwd)/final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Virtual Environment (Recommended)¶

Create Virtual Environment¶

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install in Virtual Environment¶

# Install dependencies
pip install -r api/api_requirements.txt

# Verify installation
python -c "import fastapi, joblib, sklearn; print('All dependencies installed successfully')"

Configuration¶

Environment Variables¶

Create a .env file in the api directory:

# Model paths
MODEL_PATH=../final_best_model.pkl
VECTORIZER_PATH=../tfidf_vectorizer.pkl

# Performance settings
MAX_BATCH_SIZE=50
MAX_TEXT_LENGTH=10000

# Monitoring
ENABLE_METRICS=true

# Server settings
PORT=8000

Configuration Options¶

Variable	Default	Description
`MODEL_PATH`	`../final_best_model.pkl`	Path to trained model file
`VECTORIZER_PATH`	`../tfidf_vectorizer.pkl`	Path to TF-IDF vectorizer file
`MAX_BATCH_SIZE`	`50`	Maximum texts per batch request
`MAX_TEXT_LENGTH`	`10000`	Maximum characters per text
`ENABLE_METRICS`	`true`	Enable performance metrics endpoint
`PORT`	`8000`	Server port

Verification¶

Test Installation¶

# Navigate to API directory
cd api

# Run the API
python main.py

# In another terminal, test the API
curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "models_loaded": true,
  "vectorizer_loaded": true,
  "memory_usage_mb": 45.2,
  "uptime_seconds": 5.5
}

Test Prediction¶

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is a great product!"}'

Expected response:

{
  "sentiment": "positive",
  "confidence": 0.89,
  "model_used": "Ensemble Best Model",
  "processing_time": 0.023,
  "request_id": "req-1"
}

Troubleshooting¶

Common Issues¶

Import Errors

Error: No module named 'fastapi'
Solution: Run `pip install -r api_requirements.txt`

Model Loading Errors

Error: Model file not found
Solution: Ensure model files are in the correct paths

Memory Errors

Error: Out of memory
Solution: Reduce MAX_BATCH_SIZE, restart with more RAM

Port Already in Use

Error: Port 8000 already in use
Solution: Change PORT in .env file or use different port

System Requirements Check¶

# Run this script to check system compatibility
import sys
import psutil

print(f"Python version: {sys.version}")
print(f"Available RAM: {psutil.virtual_memory().available / 1024 / 1024:.0f} MB")
print(f"CPU cores: {psutil.cpu_count()}")

# Check required packages
required_packages = ['fastapi', 'joblib', 'sklearn', 'tensorflow']
for package in required_packages:
    try:
        __import__(package)
        print(f"✓ {package} available")
    except ImportError:
        print(f"✗ {package} missing")

Next Steps¶

After successful installation: