Installation¶
Step-by-step guide to install and set up the Text Classification API.
Prerequisites¶
- Python 3.11 or higher
- pip package manager
- Git (for cloning repository)
- 2GB+ available RAM
- 1GB+ available disk space
Local Installation¶
1. Clone Repository¶
2. Install Dependencies¶
3. Download NLTK Data¶
# Run Python to download required NLTK data
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt'); nltk.download('wordnet')"
4. Prepare Model Files¶
# Copy model files to root directory (from your training)
cp ../final_best_model.pkl .
cp ../tfidf_vectorizer.pkl .
Docker Installation¶
Using Docker Compose¶
Using Docker Directly¶
# Build image
docker build -t text-classifier-api ./api
# Run container
docker run -p 8000:8000 \
-v $(pwd)/final_best_model.pkl:/app/final_best_model.pkl \
-v $(pwd)/tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
text-classifier-api
Virtual Environment (Recommended)¶
Create Virtual Environment¶
# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
Install in Virtual Environment¶
# Install dependencies
pip install -r api/api_requirements.txt
# Verify installation
python -c "import fastapi, joblib, sklearn; print('All dependencies installed successfully')"
Configuration¶
Environment Variables¶
Create a .env file in the api directory:
# Model paths
MODEL_PATH=../final_best_model.pkl
VECTORIZER_PATH=../tfidf_vectorizer.pkl
# Performance settings
MAX_BATCH_SIZE=50
MAX_TEXT_LENGTH=10000
# Monitoring
ENABLE_METRICS=true
# Server settings
PORT=8000
Configuration Options¶
| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
../final_best_model.pkl |
Path to trained model file |
VECTORIZER_PATH |
../tfidf_vectorizer.pkl |
Path to TF-IDF vectorizer file |
MAX_BATCH_SIZE |
50 |
Maximum texts per batch request |
MAX_TEXT_LENGTH |
10000 |
Maximum characters per text |
ENABLE_METRICS |
true |
Enable performance metrics endpoint |
PORT |
8000 |
Server port |
Verification¶
Test Installation¶
# Navigate to API directory
cd api
# Run the API
python main.py
# In another terminal, test the API
curl http://localhost:8000/health
Expected response:
{
"status": "healthy",
"models_loaded": true,
"vectorizer_loaded": true,
"memory_usage_mb": 45.2,
"uptime_seconds": 5.5
}
Test Prediction¶
curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{"text": "This is a great product!"}'
Expected response:
{
"sentiment": "positive",
"confidence": 0.89,
"model_used": "Ensemble Best Model",
"processing_time": 0.023,
"request_id": "req-1"
}
Troubleshooting¶
Common Issues¶
-
Import Errors
-
Model Loading Errors
-
Memory Errors
-
Port Already in Use
System Requirements Check¶
# Run this script to check system compatibility
import sys
import psutil
print(f"Python version: {sys.version}")
print(f"Available RAM: {psutil.virtual_memory().available / 1024 / 1024:.0f} MB")
print(f"CPU cores: {psutil.cpu_count()}")
# Check required packages
required_packages = ['fastapi', 'joblib', 'sklearn', 'tensorflow']
for package in required_packages:
try:
__import__(package)
print(f"✓ {package} available")
except ImportError:
print(f"✗ {package} missing")
Next Steps¶
After successful installation: