Contributing¶

Guidelines for contributing to the Text Classification API project.

Development Setup¶

Prerequisites¶

Python 3.11+
Docker 20.10+
Git

Local Development¶

Clone the repository:

git clone <repository-url>
cd text-classifier-api

Set up virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r api/api_requirements.txt

Set up environment variables:

cp api/.env.example api/.env
# Edit .env with your configuration

Run the API:
```
cd api
python main.py
```
Test the API:
```
curl http://localhost:8000/health
```

Docker Development¶

Build the image:

cd api
docker build -t text-classifier-api .

Run the container:

docker run -p 8000:8000 \
  -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
  -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
  text-classifier-api

Development Workflow¶

1. Choose an Issue¶

Check existing issues for tasks
Create a new issue for bugs or features
Discuss large changes before starting work

2. Create a Branch¶

git checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-number-description

3. Make Changes¶

Follow the coding standards
Write tests for new features
Update documentation
Test your changes

4. Commit Changes¶

git add .
git commit -m "feat: add new feature

- Description of changes
- Related issue: #123"

5. Push and Create PR¶

git push origin feature/your-feature-name

Create a pull request with: - Clear description of changes - Screenshots for UI changes - Test results - Documentation updates

Coding Standards¶

Python Style¶

Follow PEP 8 style guide
Use Black for code formatting
Use isort for import sorting
Maximum line length: 88 characters

Code Quality¶

Write descriptive variable and function names
Add docstrings to all functions and classes
Use type hints for function parameters and return values
Keep functions small and focused

Example Code Style¶

from typing import List, Optional
from pydantic import BaseModel

class PredictionRequest(BaseModel):
    """Request model for text classification."""
    texts: List[str]
    batch_size: Optional[int] = 10

async def predict_texts(
    request: PredictionRequest,
    model_cache: ModelCache
) -> PredictionResponse:
    """
    Classify multiple texts using the trained model.

    Args:
        request: Prediction request with texts and batch size
        model_cache: Cached model instance

    Returns:
        Prediction response with classifications
    """
    # Implementation here
    pass

Testing¶

Running Tests¶

# Run all tests
pytest

# Run with coverage
pytest --cov=api --cov-report=html

# Run specific test file
pytest tests/test_api.py

# Run with verbose output
pytest -v

Writing Tests¶

Use pytest framework
Place tests in tests/ directory
Name test files test_*.py
Use descriptive test names

import pytest
from fastapi.testclient import TestClient

def test_predict_endpoint(client: TestClient):
    """Test the predict endpoint with valid input."""
    response = client.post(
        "/predict",
        json={"texts": ["Great product!"]}
    )
    assert response.status_code == 200
    data = response.json()
    assert "predictions" in data
    assert len(data["predictions"]) == 1

def test_predict_invalid_input(client: TestClient):
    """Test the predict endpoint with invalid input."""
    response = client.post(
        "/predict",
        json={"invalid": "data"}
    )
    assert response.status_code == 422

Test Coverage¶

Aim for >80% code coverage
Cover happy path and error cases
Test edge cases and boundary conditions

Documentation¶

API Documentation¶

Update API reference for new endpoints
Add examples for new features
Document breaking changes

Code Documentation¶

Add docstrings to new functions
Update README for new features
Update configuration documentation

Building Docs¶

cd api
python build_docs.py

Performance Guidelines¶

Memory Usage¶

Keep memory usage under 512MB for free tier compatibility
Use streaming for large responses
Implement proper cleanup for resources

Response Times¶

API responses should be < 2 seconds
Implement timeouts for long operations
Use async processing for I/O operations

Scalability¶

Design for horizontal scaling
Use stateless operations
Implement proper caching strategies

Security Considerations¶

Input Validation¶

Validate all user inputs
Sanitize text inputs
Implement rate limiting (future)

Dependencies¶

Keep dependencies updated
Use trusted packages only
Scan for vulnerabilities regularly

Secrets Management¶

Never commit secrets to code
Use environment variables for configuration
Document required environment variables

Git Guidelines¶

Commit Messages¶

Follow conventional commit format:

type(scope): description

[optional body]

[optional footer]

Types: - feat: New feature - fix: Bug fix - docs: Documentation - style: Code style changes - refactor: Code refactoring - test: Testing - chore: Maintenance

Examples:

feat: add batch prediction endpoint

- Support processing multiple texts in single request
- Improve throughput by 3x
- Add comprehensive input validation

Closes #123

fix: handle empty text input

- Return appropriate error for empty strings
- Add test case for edge case

Fixes #456

Branch Naming¶

feature/description-of-feature
fix/issue-number-description
docs/update-documentation
refactor/component-name

Pull Request Process¶

Before Submitting¶

[ ] Code follows style guidelines
[ ] Tests pass and coverage >80%
[ ] Documentation updated
[ ] No breaking changes without discussion
[ ] Commit messages follow conventions

PR Description¶

Include: - What changes were made - Why changes were needed - How to test the changes - Screenshots/videos for UI changes - Breaking changes and migration notes

Review Process¶

Automated checks run (tests, linting)
Code review by maintainer
Address review comments
Merge when approved

Issue Reporting¶

Bug Reports¶

Include: - Steps to reproduce - Expected behavior - Actual behavior - Environment details - Error messages/logs

Feature Requests¶

Include: - Use case description - Proposed solution - Alternative solutions considered - Additional context

Community¶

Code of Conduct¶

Be respectful and inclusive
Focus on constructive feedback
Help newcomers learn
Report issues appropriately

Getting Help¶

Check existing issues and documentation
Ask questions in discussions
Be patient with responses

License¶

By contributing, you agree that your contributions will be licensed under the same license as the project.

Recognition¶

Contributors will be acknowledged in: - CHANGELOG.md for significant contributions - GitHub contributors list - Release notes

Thank you for contributing to the Text Classification API! 🚀