IDEAL-GENOM Documentation

Documentation Status PyPI version

IDEAL-GENOM is a comprehensive Python package for automated, reproducible analysis of human genotype data. It provides end-to-end pipelines for genomic quality control (QC), post-imputation VCF processing, and genome-wide association studies (GWAS). The package wraps years of research expertise from CGE Tübingen, integrating PLINK 1.9/2.0, GCTA, and BCFtools with rich reporting and visualizations.

Version: 1.1.0

🎯 Key Features

Comprehensive Pipelines
  • Genomic QC: Sample QC, Ancestry QC, and Variant QC for case-control studies

  • GWAS Analysis: Generalized Linear Models (GLM) and Mixed Models (GLMM)

  • VCF Processing: Post-imputation filtering, normalization, and conversion to PLINK

  • Population Structure: FST statistics, PCA, UMAP visualization, and ancestry projection

Advanced Analytics
  • Sample Quality Control: Missingness, sex verification, heterozygosity, relatedness (kinship/IBD)

  • Ancestry Analysis: Population stratification detection with 1000 Genomes reference

  • Variant Filtering: Hardy-Weinberg equilibrium, MAF, genotype rate, differential missingness

  • GWAS Tools: Association testing, top-hits extraction, gene annotation (Ensembl/RefSeq)

  • Dimensionality Reduction: PCA and UMAP for population structure visualization

Modern Design
  • YAML Configuration: Single configuration file with clear, hierarchical structure

  • Flexible Pipeline System: Enable/disable steps, customize parameters per analysis

  • Multiple Interfaces: Command-line tool, Python API, Jupyter notebooks

  • Docker Support: Pre-configured container with all genomic tools installed

  • Automated Workflows: Pipeline executor handles dependencies and data flow

  • Rich Reporting: Publication-ready plots and comprehensive QC metrics

Modern Design
  • YAML Configuration: Single configuration file with clear, hierarchical structure

  • Flexible Pipeline System: Enable/disable steps, customize parameters per analysis

  • Multiple Interfaces: Command-line tool, Python API, Jupyter notebooks

  • Docker Support: Pre-configured container with all genomic tools installed

  • Automated Workflows: Pipeline executor handles dependencies and data flow

  • Rich Reporting: Publication-ready plots and comprehensive QC metrics

Developer Friendly
  • Reproducible: All steps, parameters, and outputs logged

  • Extensible: Modular architecture for adding custom analysis steps

  • Well Documented: Comprehensive guides, API reference, and examples

  • Type Hints: Full type annotations for better IDE support

Quick Start

Installation

pip install ideal-genom

Basic Usage

# Generate a configuration template
ideal-genom template --output my_pipeline.yaml

# Edit the configuration file to match your data
nano my_pipeline.yaml

# Validate your configuration
ideal-genom validate --config my_pipeline.yaml

# Run the pipeline
ideal-genom run --config my_pipeline.yaml

Python API

from ideal_genom.core.config import load_config
from ideal_genom.core.pipeline import PipelineExecutor

# Load configuration
config = load_config("my_pipeline.yaml")

# Create and execute pipeline
executor = PipelineExecutor(config)
executor.execute()

Available Pipelines

QC Pipeline - Quality control for case-control studies
  1. Sample QC: Individual-level quality control

  2. Ancestry QC: Population structure and outlier detection

  3. Variant QC: SNP-level quality control

  4. Population Visualization: UMAP/t-SNE plots

GWAS Pipeline - Genome-wide association analysis
  1. Preparatory: LD pruning and PCA decomposition

  2. GLM Analysis: Fixed effects association testing

  3. GLMM Analysis: Mixed model with genetic relationship matrix

  4. Annotation: Gene mapping and functional annotation

VCF Pipeline - Post-imputation processing
  1. VCF Processing: Filter, normalize, annotate, concatenate

  2. PLINK Conversion: Convert to PLINK binary format

  3. Quality filtering: R² threshold, multiallelic handling

Documentation Contents

Supported Tools

IDEAL-GENOM integrates the following genomic analysis tools:

  • PLINK 1.9: Classic PLINK for QC and association analysis

  • PLINK 2.0: Modern version with improved performance (AVX2 optimized)

  • GCTA: Genetic relationship matrix and mixed model analysis

  • BCFtools: VCF manipulation and quality filtering

These tools are automatically used by the pipeline and must be installed separately or use the provided Docker image.

Citation

If you use IDEAL-GENOM in your research, please cite:

@software{ideal_genom_2026,
  title = {IDEAL-GENOM: Comprehensive Genomic Analysis Pipeline},
  author = {Giraldo González, Luis and Tenghe, Amabel},
  year = {2026},
  version = {0.2.0},
  url = {https://github.com/cge-tubingens/ideal-genom-qc}
}

Getting Help

License

IDEAL-GENOM is released under the MIT License. See the LICENSE file in the repository for details.

Indices and tables