Skip to content

Installation

Step-by-step guide to install and set up the Text Classification API.

Prerequisites

  • Python 3.11 or higher
  • pip package manager
  • Git (for cloning repository)
  • 2GB+ available RAM
  • 1GB+ available disk space

Local Installation

1. Clone Repository

git clone https://github.com/yourusername/text-classifier-api.git
cd text-classifier-api

2. Install Dependencies

# Navigate to API directory
cd api

# Install Python packages
pip install -r api_requirements.txt

3. Download NLTK Data

# Run Python to download required NLTK data
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"

4. Prepare Model Files

# Copy model files to root directory (from your training)
cp ../final_best_model.pkl .
cp ../tfidf_vectorizer.pkl .

Docker Installation

Using Docker Compose

# Build and run
docker-compose -f api/docker-compose.yml up --build

Using Docker Directly

# Build image
docker build -t text-classifier-api ./api

# Run container
docker run -p 8000:8000 \
  -v $(pwd)/final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Create Virtual Environment

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install in Virtual Environment

# Install dependencies
pip install -r api/api_requirements.txt

# Verify installation
python -c "import fastapi, joblib, sklearn; print('All dependencies installed successfully')"

Configuration

Environment Variables

Create a .env file in the api directory:

# Model paths
MODEL_PATH=../final_best_model.pkl
VECTORIZER_PATH=../tfidf_vectorizer.pkl

# Performance settings
MAX_BATCH_SIZE=50
MAX_TEXT_LENGTH=10000

# Monitoring
ENABLE_METRICS=true

# Server settings
PORT=8000

Configuration Options

Variable Default Description
MODEL_PATH ../final_best_model.pkl Path to trained model file
VECTORIZER_PATH ../tfidf_vectorizer.pkl Path to TF-IDF vectorizer file
MAX_BATCH_SIZE 50 Maximum texts per batch request
MAX_TEXT_LENGTH 10000 Maximum characters per text
ENABLE_METRICS true Enable performance metrics endpoint
PORT 8000 Server port

Verification

Test Installation

# Navigate to API directory
cd api

# Run the API
python main.py

# In another terminal, test the API
curl http://localhost:8000/health

Expected response:

{
  "status": "healthy",
  "models_loaded": true,
  "vectorizer_loaded": true,
  "memory_usage_mb": 45.2,
  "uptime_seconds": 5.5
}

Test Prediction

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{"text": "This is a great product!"}'

Expected response:

{
  "sentiment": "positive",
  "confidence": 0.89,
  "model_used": "Ensemble Best Model",
  "processing_time": 0.023,
  "request_id": "req-1"
}

Troubleshooting

Common Issues

  1. Import Errors

    Error: No module named 'fastapi'
    Solution: Run `pip install -r api_requirements.txt`
    

  2. Model Loading Errors

    Error: Model file not found
    Solution: Ensure model files are in the correct paths
    

  3. Memory Errors

    Error: Out of memory
    Solution: Reduce MAX_BATCH_SIZE, restart with more RAM
    

  4. Port Already in Use

    Error: Port 8000 already in use
    Solution: Change PORT in .env file or use different port
    

System Requirements Check

# Run this script to check system compatibility
import sys
import psutil

print(f"Python version: {sys.version}")
print(f"Available RAM: {psutil.virtual_memory().available / 1024 / 1024:.0f} MB")
print(f"CPU cores: {psutil.cpu_count()}")

# Check required packages
required_packages = ['fastapi', 'joblib', 'sklearn', 'tensorflow']
for package in required_packages:
    try:
        __import__(package)
        print(f"✓ {package} available")
    except ImportError:
        print(f"✗ {package} missing")

Next Steps

After successful installation:

  1. Run the API
  2. Configure for production
  3. Deploy to cloud