Contributing Guide

We welcome contributions to IDEAL-GENOM-QC! This guide will help you get started with contributing to the project, whether you’re fixing bugs, adding features, improving documentation, or helping with testing.

Getting Started

Development Setup

  1. Fork and clone the repository:

# Fork on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/IDEAL-GENOM-QC.git
cd IDEAL-GENOM-QC
  1. Set up development environment:

# Install Poetry (if not already installed)
curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies
poetry install

# Activate virtual environment
poetry shell
  1. Install development dependencies:

# Install additional development tools
poetry install --with dev

# Install pre-commit hooks
pre-commit install
  1. Verify installation:

# Run tests
pytest

# Check code style
black --check .
flake8 .

Project Structure

Understanding the codebase structure:

IDEAL-GENOM-QC/
├── ideal_genom_qc/          # Main package
│   ├── __init__.py
│   ├── SampleQC.py          # Sample quality control
│   ├── AncestryQC.py        # Ancestry analysis
│   ├── VariantQC.py         # Variant quality control
│   ├── PopStructure.py      # Population structure analysis
│   ├── UMAPplot.py          # UMAP visualization
│   ├── Helpers.py           # Utility functions
│   └── get_references.py    # Reference data handling
├── tests/                   # Test suite
├── docs/                    # Documentation
├── notebooks/               # Example notebooks
├── data/                    # Reference data
└── pyproject.toml           # Project configuration

Types of Contributions

We welcome several types of contributions:

Bug Reports

Before submitting a bug report:

  • Check existing issues to avoid duplicates

  • Test with the latest version

  • Gather system information and error logs

Bug report template:

**Bug Description**
A clear description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Configuration used
2. Command executed
3. Error encountered

**Expected Behavior**
What you expected to happen.

**Environment**
- OS: [e.g., Ubuntu 20.04]
- Python version: [e.g., 3.9.7]
- IDEAL-GENOM-QC version: [e.g., 0.1.0]
- PLINK versions: [e.g., 1.9, 2.0]

**Additional Context**
- Configuration files
- Log files
- Sample data characteristics

Feature Requests

Feature request template:

**Feature Description**
A clear description of what you want to achieve.

**Use Case**
Why is this feature needed? What problem does it solve?

**Proposed Solution**
How would you like this implemented?

**Alternatives Considered**
What other solutions have you considered?

**Additional Context**
Any other context or screenshots about the feature request.

Code Contributions

Development workflow:

  1. Create a feature branch:

git checkout -b feature/new-qc-method
# or
git checkout -b bugfix/fix-memory-leak
  1. Make your changes:

  • Follow the existing code style

  • Add tests for new functionality

  • Update documentation as needed

  • Keep commits atomic and well-described

  1. Test your changes:

# Run all tests
pytest

# Test specific modules
pytest tests/test_sample_qc.py

# Run with coverage
pytest --cov=ideal_genom_qc
  1. Check code quality:

# Format code
black .

# Check style
flake8 .

# Type checking
mypy ideal_genom_qc/
  1. Commit and push:

git add .
git commit -m "Add new QC method for contamination detection"
git push origin feature/new-qc-method
  1. Create pull request:

  • Use the PR template

  • Reference any related issues

  • Include screenshots for UI changes

  • Wait for review and address feedback

Documentation Contributions

Types of documentation improvements:

  • API documentation improvements

  • Tutorial enhancements

  • Example additions

  • Typo fixes

  • Translation (future)

Documentation workflow:

# Install documentation dependencies
poetry install --with docs

# Build documentation locally
cd docs/
make html

# Open in browser
open build/html/index.html

Testing Contributions

Help improve test coverage:

# Check current coverage
pytest --cov=ideal_genom_qc --cov-report=html
open htmlcov/index.html

Types of tests needed:

  • Unit tests for individual functions

  • Integration tests for complete workflows

  • Performance tests for large datasets

  • Cross-platform compatibility tests

Code Style Guidelines

Python Style

We follow PEP 8 with some modifications:

  • Line length: 88 characters (Black default)

  • Imports: Use isort for import sorting

  • Docstrings: Use Google-style docstrings

  • Type hints: Use type hints for public APIs

Example function:

def calculate_kinship_matrix(
    input_path: Path,
    output_path: Path,
    maf_threshold: float = 0.01,
    missing_threshold: float = 0.1
) -> pd.DataFrame:
    """Calculate kinship matrix for sample relatedness analysis.

    Args:
        input_path: Path to input PLINK files
        output_path: Path for output files
        maf_threshold: Minor allele frequency threshold
        missing_threshold: Maximum missing data rate

    Returns:
        DataFrame containing kinship coefficients

    Raises:
        FileNotFoundError: If input files don't exist
        ValueError: If thresholds are out of valid range
    """
    # Implementation here
    pass

Documentation Style

  • RestructuredText: Use .rst format for documentation

  • Clear examples: Include working code examples

  • Cross-references: Link between related sections

  • Screenshots: Include for UI elements

Example documentation:

Sample Quality Control
======================

The :class:`SampleQC` class performs comprehensive quality control
on individual samples in your genomic dataset.

Basic Usage
-----------

.. code-block:: python

    from ideal_genom_qc import SampleQC

    qc = SampleQC(
        input_path="data/input",
        input_name="mydata",
        output_path="data/output",
        output_name="clean_data"
    )

    qc.run_sample_qc()

Git Workflow

Branch Naming

Use descriptive branch names:

  • feature/add-contamination-detection

  • bugfix/fix-memory-leak-in-pca

  • docs/improve-api-documentation

  • test/add-integration-tests

Commit Messages

Follow conventional commit format:

type(scope): description

[optional body]

[optional footer]

Examples:

feat(ancestry): add support for custom reference populations

fix(sample_qc): resolve memory leak in kinship calculation

docs(api): add examples to SampleQC class documentation

test(variant_qc): add unit tests for HWE calculation

Types: - feat: New feature - fix: Bug fix - docs: Documentation - test: Tests - refactor: Code refactoring - perf: Performance improvement - style: Code style changes

Pull Request Process

PR Template

Pull request template:

## Description
Brief description of what this PR does.

## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring

## Testing
- [ ] Tests pass locally
- [ ] Added new tests for changes
- [ ] Tested on sample datasets

## Documentation
- [ ] Updated API documentation
- [ ] Updated user documentation
- [ ] Added/updated examples

## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Commented hard-to-understand areas
- [ ] No merge conflicts

## Related Issues
Fixes #123
Related to #456

Review Process

What reviewers look for:

  1. Correctness: Does the code do what it’s supposed to do?

  2. Testing: Are there adequate tests?

  3. Documentation: Is the code well-documented?

  4. Style: Does it follow project conventions?

  5. Performance: Will it negatively impact performance?

  6. Compatibility: Will it break existing functionality?

Responding to feedback:

  • Address all comments

  • Ask for clarification if needed

  • Update tests and documentation

  • Force-push updates to your branch

Release Process

Versioning

We use semantic versioning (semver):

  • MAJOR: Incompatible API changes

  • MINOR: New functionality (backward compatible)

  • PATCH: Bug fixes (backward compatible)

Examples: - 0.1.00.1.1 (bug fix) - 0.1.10.2.0 (new feature) - 0.2.01.0.0 (major API change)

Changelog

We maintain a changelog following Keep a Changelog:

# Changelog

## [Unreleased]
### Added
- New contamination detection method

### Fixed
- Memory leak in PCA calculation

## [0.1.0] - 2025-01-15
### Added
- Initial release
- Sample QC functionality
- Ancestry analysis
- Variant QC
- UMAP visualization

Community Guidelines

Code of Conduct

We are committed to providing a welcoming and inclusive environment. Please:

  • Be respectful and constructive

  • Welcome newcomers and help them learn

  • Focus on what’s best for the community

  • Use inclusive language

  • Be patient with questions and mistakes

Communication

Preferred channels:

  • GitHub Issues: Bug reports, feature requests

  • GitHub Discussions: General questions, ideas

  • Pull Request comments: Code-specific discussions

  • Email: Security issues, private matters

Communication guidelines:

  • Be clear and concise

  • Provide context and examples

  • Use searchable, descriptive titles

  • Follow up on conversations

  • Tag relevant maintainers when needed

Recognition

Contributors will be recognized in:

  • Authors file: Major contributors

  • Release notes: Feature contributors

  • Documentation: Example providers

  • GitHub: All contributors via GitHub’s contributor graph

Types of recognition:

  • Code contributions

  • Documentation improvements

  • Bug reports and testing

  • Community support

  • Translations (future)

Getting Help

If you need help contributing:

  • Read existing issues and PRs for examples

  • Start with “good first issue” labels

  • Ask questions in GitHub discussions

  • Join our community calls (when available)

  • Reach out to maintainers directly

Resources:

Thank you for contributing to IDEAL-GENOM-QC! 🎉