Skip to content

Contributing

Guidelines for contributing to the Text Classification API project.

Development Setup

Prerequisites

  • Python 3.11+
  • Docker 20.10+
  • Git

Local Development

  1. Clone the repository:

    git clone <repository-url>
    cd text-classifier-api
    

  2. Set up virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    

  3. Install dependencies:

    pip install -r api/api_requirements.txt
    

  4. Set up environment variables:

    cp api/.env.example api/.env
    # Edit .env with your configuration
    

  5. Run the API:

    cd api
    python main.py
    

  6. Test the API:

    curl http://localhost:8000/health
    

Docker Development

  1. Build the image:

    cd api
    docker build -t text-classifier-api .
    

  2. Run the container:

    docker run -p 8000:8000 \
      -v $(pwd)/../final_best_model.pkl:/app/final_best_model.pkl \
      -v $(pwd)/../tfidf_vectorizer.pkl:/app/tfidf_vectorizer.pkl \
      text-classifier-api
    

Development Workflow

1. Choose an Issue

  • Check existing issues for tasks
  • Create a new issue for bugs or features
  • Discuss large changes before starting work

2. Create a Branch

git checkout -b feature/your-feature-name
# or
git checkout -b fix/issue-number-description

3. Make Changes

  • Follow the coding standards
  • Write tests for new features
  • Update documentation
  • Test your changes

4. Commit Changes

git add .
git commit -m "feat: add new feature

- Description of changes
- Related issue: #123"

5. Push and Create PR

git push origin feature/your-feature-name

Create a pull request with: - Clear description of changes - Screenshots for UI changes - Test results - Documentation updates

Coding Standards

Python Style

  • Follow PEP 8 style guide
  • Use Black for code formatting
  • Use isort for import sorting
  • Maximum line length: 88 characters

Code Quality

  • Write descriptive variable and function names
  • Add docstrings to all functions and classes
  • Use type hints for function parameters and return values
  • Keep functions small and focused

Example Code Style

from typing import List, Optional
from pydantic import BaseModel

class PredictionRequest(BaseModel):
    """Request model for text classification."""
    texts: List[str]
    batch_size: Optional[int] = 10

async def predict_texts(
    request: PredictionRequest,
    model_cache: ModelCache
) -> PredictionResponse:
    """
    Classify multiple texts using the trained model.

    Args:
        request: Prediction request with texts and batch size
        model_cache: Cached model instance

    Returns:
        Prediction response with classifications
    """
    # Implementation here
    pass

Testing

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=api --cov-report=html

# Run specific test file
pytest tests/test_api.py

# Run with verbose output
pytest -v

Writing Tests

  • Use pytest framework
  • Place tests in tests/ directory
  • Name test files test_*.py
  • Use descriptive test names
import pytest
from fastapi.testclient import TestClient

def test_predict_endpoint(client: TestClient):
    """Test the predict endpoint with valid input."""
    response = client.post(
        "/predict",
        json={"texts": ["Great product!"]}
    )
    assert response.status_code == 200
    data = response.json()
    assert "predictions" in data
    assert len(data["predictions"]) == 1

def test_predict_invalid_input(client: TestClient):
    """Test the predict endpoint with invalid input."""
    response = client.post(
        "/predict",
        json={"invalid": "data"}
    )
    assert response.status_code == 422

Test Coverage

  • Aim for >80% code coverage
  • Cover happy path and error cases
  • Test edge cases and boundary conditions

Documentation

API Documentation

  • Update API reference for new endpoints
  • Add examples for new features
  • Document breaking changes

Code Documentation

  • Add docstrings to new functions
  • Update README for new features
  • Update configuration documentation

Building Docs

cd api
python build_docs.py

Performance Guidelines

Memory Usage

  • Keep memory usage under 512MB for free tier compatibility
  • Use streaming for large responses
  • Implement proper cleanup for resources

Response Times

  • API responses should be < 2 seconds
  • Implement timeouts for long operations
  • Use async processing for I/O operations

Scalability

  • Design for horizontal scaling
  • Use stateless operations
  • Implement proper caching strategies

Security Considerations

Input Validation

  • Validate all user inputs
  • Sanitize text inputs
  • Implement rate limiting (future)

Dependencies

  • Keep dependencies updated
  • Use trusted packages only
  • Scan for vulnerabilities regularly

Secrets Management

  • Never commit secrets to code
  • Use environment variables for configuration
  • Document required environment variables

Git Guidelines

Commit Messages

Follow conventional commit format:

type(scope): description

[optional body]

[optional footer]

Types: - feat: New feature - fix: Bug fix - docs: Documentation - style: Code style changes - refactor: Code refactoring - test: Testing - chore: Maintenance

Examples:

feat: add batch prediction endpoint

- Support processing multiple texts in single request
- Improve throughput by 3x
- Add comprehensive input validation

Closes #123

fix: handle empty text input

- Return appropriate error for empty strings
- Add test case for edge case

Fixes #456

Branch Naming

  • feature/description-of-feature
  • fix/issue-number-description
  • docs/update-documentation
  • refactor/component-name

Pull Request Process

Before Submitting

  • [ ] Code follows style guidelines
  • [ ] Tests pass and coverage >80%
  • [ ] Documentation updated
  • [ ] No breaking changes without discussion
  • [ ] Commit messages follow conventions

PR Description

Include: - What changes were made - Why changes were needed - How to test the changes - Screenshots/videos for UI changes - Breaking changes and migration notes

Review Process

  1. Automated checks run (tests, linting)
  2. Code review by maintainer
  3. Address review comments
  4. Merge when approved

Issue Reporting

Bug Reports

Include: - Steps to reproduce - Expected behavior - Actual behavior - Environment details - Error messages/logs

Feature Requests

Include: - Use case description - Proposed solution - Alternative solutions considered - Additional context

Community

Code of Conduct

  • Be respectful and inclusive
  • Focus on constructive feedback
  • Help newcomers learn
  • Report issues appropriately

Getting Help

  • Check existing issues and documentation
  • Ask questions in discussions
  • Be patient with responses

License

By contributing, you agree that your contributions will be licensed under the same license as the project.

Recognition

Contributors will be acknowledged in: - CHANGELOG.md for significant contributions - GitHub contributors list - Release notes

Thank you for contributing to the Text Classification API! 🚀