Visualization Modules

The ideal_genom.visualizations package provides functions for creating publication-ready plots for GWAS and genomic analysis.

Module Overview

manhattan_type

Generate Manhattan and Miami plots for genome-wide association studies (GWAS).

Module: ideal_genom.visualizations.manhattan_type

Features:

  • Data processing and visualization of GWAS summary statistics

  • Annotation of SNPs with gene information from various sources

  • Highlighting and labeling of specific SNPs of interest

  • Support for both Manhattan (single study) and Miami (two studies) plots

Key Functions:

compute_relative_pos(data, chr_col='CHR', pos_col='POS', p_col='p')

Compute the relative position of probes/SNPs across chromosomes and add a -log10(p-value) column.

Parameters:
  • data (pandas.DataFrame) – Input DataFrame containing genomic data

  • chr_col (str) – Column name for chromosome identifiers

  • pos_col (str) – Column name for base pair positions

  • p_col (str) – Column name for p-values

Returns:

DataFrame with added columns for relative positions and -log10(p-values)

Return type:

pandas.DataFrame

manhattan(df_gwas, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps=None, alpha=0.7, save_name='manhattan.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 6), dpi=500)

Generate a Manhattan plot from GWAS summary statistics.

Parameters:
  • df_gwas (pandas.DataFrame) – DataFrame containing GWAS results

  • plots_dir (str) – Directory path where the plot will be saved

  • pval_col (str) – Column name for p-values

  • chr_col (str) – Column name for chromosome

  • pos_col (str) – Column name for base pair position

  • snp_col (str) – Column name for SNP identifiers

  • p_threshold (float) – Genome-wide significance threshold

  • annotate (Optional[list]) – List of SNP IDs to annotate with gene names

  • annotation_type (str) – Source for gene annotation (‘ensembl’, ‘refseq’, or ‘both’)

  • genome_build (str) – Genome build version (‘37’ or ‘38’)

  • api_request (bool) – Whether to use API for annotation (if False, uses local GTF)

  • highlight_snps (Optional[list]) – List of SNP IDs to highlight in different color

  • alpha (float) – Transparency level for points

  • save_name (str) – Filename for saving the plot

  • colors (Optional[list]) – Custom colors for alternating chromosomes

  • chr_text_shift (Optional[float]) – Shift amount for chromosome labels

  • fig_size (tuple) – Figure size (width, height) in inches

  • dpi (int) – Resolution for saved figure

Returns:

True if successful

Return type:

bool

miami(df_gwas1, df_gwas2, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate1=None, annotate2=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps1=None, highlight_snps2=None, alpha=0.7, save_name='miami.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 12), dpi=500, plot1_label='Study 1', plot2_label='Study 2')

Generate a Miami plot (back-to-back Manhattan plots) comparing two GWAS studies.

Parameters:
  • df_gwas1 (pandas.DataFrame) – DataFrame containing GWAS results for first study

  • df_gwas2 (pandas.DataFrame) – DataFrame containing GWAS results for second study

  • plots_dir (str) – Directory path where the plot will be saved

  • plot1_label (str) – Label for the top plot

  • plot2_label (str) – Label for the bottom plot

Returns:

True if successful

Return type:

bool

Other parameters are the same as manhattan() function

Usage Example:

import pandas as pd
from ideal_genom.visualizations.manhattan_type import manhattan, miami

# Load GWAS summary statistics
gwas_df = pd.read_csv("gwas_results.txt", sep="\t")

# Generate Manhattan plot
manhattan(
    df_gwas=gwas_df,
    plots_dir="./plots",
    pval_col='P',
    chr_col='CHR',
    pos_col='BP',
    snp_col='SNP',
    p_threshold=5e-8,
    annotate=['rs12345', 'rs67890'],  # Annotate specific SNPs
    annotation_type='ensembl',
    genome_build='38',
    save_name='my_manhattan.jpeg'
)

# Generate Miami plot comparing two studies
gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t")
miami(
    df_gwas1=gwas_df,
    df_gwas2=gwas_df2,
    plots_dir="./plots",
    plot1_label='Discovery cohort',
    plot2_label='Replication cohort',
    save_name='my_miami.jpeg'
)

plots

Functions for generating various plots for GWAS data analysis.

Module: ideal_genom.visualizations.plots

Features:

  • QQ plots for visualizing the distribution of p-values

  • Beta-beta scatter plots for comparing effect sizes between studies

  • Trumpet plots for visualizing power and effect sizes

  • Support for both binary and quantitative traits

Key Functions:

qqplot_draw(df_gwas, plots_dir, lambda_val=None, pval_col='P', conf_color='lightgray', save_name='qq_plot.jpeg', fig_size=(10, 10), dpi=500)

Create a Q-Q (Quantile-Quantile) plot from GWAS results.

This function generates a Q-Q plot comparing observed vs expected -log10(p-values) from GWAS results, including confidence intervals and genomic inflation factor (λ).

Parameters:
  • df_gwas (pandas.DataFrame) – DataFrame containing GWAS results with p-values

  • plots_dir (str) – Directory path where the plot will be saved

  • lambda_val (Optional[float]) – Genomic inflation factor (calculated if None)

  • pval_col (str) – Column name for p-values

  • conf_color (str) – Color for confidence interval bands

  • save_name (str) – Filename for saving the plot

  • fig_size (tuple) – Figure size (width, height) in inches

  • dpi (int) – Resolution for saved figure

Returns:

True if successful

Return type:

bool

beta_beta_plot(df1, df2, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, save_name='beta_beta.jpeg', fig_size=(10, 10), dpi=500, x_label='Study 1', y_label='Study 2')

Create a beta-beta scatter plot comparing effect sizes between two GWAS studies.

Parameters:
  • df1 (pandas.DataFrame) – DataFrame containing GWAS results for first study

  • df2 (pandas.DataFrame) – DataFrame containing GWAS results for second study

  • plots_dir (str) – Directory path where the plot will be saved

  • beta_col (str) – Column name for effect sizes (beta)

  • se_col (str) – Column name for standard errors

  • snp_col (str) – Column name for SNP identifiers

  • pval_col (str) – Column name for p-values

  • p_threshold (float) – Significance threshold for highlighting SNPs

  • annotate (Optional[list]) – List of SNP IDs to annotate

  • annotation_type (str) – Source for gene annotation

  • genome_build (str) – Genome build version

  • api_request (bool) – Whether to use API for annotation

  • save_name (str) – Filename for saving the plot

  • fig_size (tuple) – Figure size (width, height) in inches

  • dpi (int) – Resolution for saved figure

  • x_label (str) – Label for x-axis

  • y_label (str) – Label for y-axis

Returns:

True if successful

Return type:

bool

trumpet_plot_binary(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_cases=None, n_controls=None, prevalence=0.5, alpha_val=0.05, save_name='trumpet_binary.jpeg', fig_size=(10, 10), dpi=500)

Create a trumpet plot for binary traits, showing power curves and effect sizes.

Parameters:
  • df_gwas (pandas.DataFrame) – DataFrame containing GWAS results

  • plots_dir (str) – Directory path where the plot will be saved

  • beta_col (str) – Column name for effect sizes

  • se_col (str) – Column name for standard errors

  • snp_col (str) – Column name for SNP identifiers

  • pval_col (str) – Column name for p-values

  • p_threshold (float) – Significance threshold

  • annotate (Optional[list]) – List of SNP IDs to annotate

  • maf_col (str) – Column name for minor allele frequency

  • n_cases (Optional[int]) – Number of cases in the study

  • n_controls (Optional[int]) – Number of controls in the study

  • prevalence (float) – Disease prevalence

  • alpha_val (float) – Significance level for power calculation

  • save_name (str) – Filename for saving the plot

Returns:

True if successful

Return type:

bool

trumpet_plot_quantitative(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_samples=None, alpha_val=0.05, save_name='trumpet_quantitative.jpeg', fig_size=(10, 10), dpi=500)

Create a trumpet plot for quantitative traits, showing power curves and effect sizes.

Parameters:
  • df_gwas (pandas.DataFrame) – DataFrame containing GWAS results

  • plots_dir (str) – Directory path where the plot will be saved

  • n_samples (Optional[int]) – Total number of samples in the study

Returns:

True if successful

Return type:

bool

Other parameters are the same as trumpet_plot_binary() function

Usage Example:

import pandas as pd
from ideal_genom.visualizations.plots import (
    qqplot_draw, beta_beta_plot,
    trumpet_plot_binary, trumpet_plot_quantitative
)

# Load GWAS results
gwas_df = pd.read_csv("gwas_results.txt", sep="\t")

# Generate QQ plot
qqplot_draw(
    df_gwas=gwas_df,
    plots_dir="./plots",
    pval_col='P',
    save_name='my_qq_plot.jpeg'
)

# Beta-beta plot comparing two studies
gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t")
beta_beta_plot(
    df1=gwas_df,
    df2=gwas_df2,
    plots_dir="./plots",
    beta_col='BETA',
    se_col='SE',
    x_label='Discovery',
    y_label='Replication',
    save_name='my_beta_beta.jpeg'
)

# Trumpet plot for binary trait
trumpet_plot_binary(
    df_gwas=gwas_df,
    plots_dir="./plots",
    n_cases=1000,
    n_controls=1000,
    prevalence=0.1,
    save_name='my_trumpet_binary.jpeg'
)

# Trumpet plot for quantitative trait
trumpet_plot_quantitative(
    df_gwas=gwas_df,
    plots_dir="./plots",
    n_samples=2000,
    save_name='my_trumpet_quant.jpeg'
)

zoom_heatmap

Create zoomed heatmap visualizations of SNP associations, gene annotations, and linkage disequilibrium (LD) patterns.

Module: ideal_genom.visualizations.zoom_heatmap

Features:

  • Filter and annotate SNP data in a genomic region

  • Calculate LD matrices using PLINK

  • Generate three-panel plots with:

    1. Association plot with SNPs colored by functional consequences

    2. Gene track showing gene locations and orientations

    3. LD heatmap showing correlation patterns between SNPs

Key Functions:

filter_sumstats(data_df, lead_snp, snp_col, p_col, pos_col, chr_col, pval_threshold=5e-8, radius=10e6)

Filter GWAS summary statistics based on a lead SNP, p-value threshold and genomic region.

Parameters:
  • data_df (pandas.DataFrame) – DataFrame containing GWAS summary statistics

  • lead_snp (str) – Lead SNP identifier to center the region around

  • snp_col (str) – Column name for SNP identifiers

  • p_col (str) – Column name for p-values

  • pos_col (str) – Column name for base pair positions

  • chr_col (str) – Column name for chromosome

  • pval_threshold (float) – P-value threshold for filtering

  • radius (Union[float, int]) – Genomic radius around lead SNP (in base pairs)

Returns:

Filtered DataFrame

Return type:

pandas.DataFrame

compute_ld(plink_file, snp_list, output_dir, lead_snp=None, ld_window_kb=10000, ld_window_snps=10000, threads=1)

Compute linkage disequilibrium matrix for a list of SNPs using PLINK.

Parameters:
  • plink_file (Union[str, Path]) – Path to PLINK binary file prefix (without .bed/.bim/.fam)

  • snp_list (list) – List of SNP IDs for LD calculation

  • output_dir (Union[str, Path]) – Directory to save output files

  • lead_snp (Optional[str]) – Lead SNP for coloring (optional)

  • ld_window_kb (int) – LD window size in kilobases

  • ld_window_snps (int) – LD window size in number of SNPs

  • threads (int) – Number of threads for PLINK

Returns:

LD matrix as DataFrame

Return type:

pandas.DataFrame

create_zoom_heatmap(sumstats_df, plink_file, lead_snp, output_dir, snp_col='SNP', chr_col='CHR', pos_col='BP', p_col='P', beta_col='BETA', pval_threshold=5e-8, radius=500000, ld_window_kb=1000, genome_build='38', annotation_type='ensembl', api_request=True, fig_size=(14, 12), dpi=300, threads=1, save_name='zoom_heatmap.png')

Create a comprehensive zoom heatmap plot with association, gene track, and LD panels.

Parameters:
  • sumstats_df (pandas.DataFrame) – DataFrame containing GWAS summary statistics

  • plink_file (Union[str, Path]) – Path to PLINK binary file prefix

  • lead_snp (str) – Lead SNP identifier to center the plot

  • output_dir (Union[str, Path]) – Directory to save output files

  • snp_col (str) – Column name for SNP identifiers

  • chr_col (str) – Column name for chromosome

  • pos_col (str) – Column name for base pair position

  • p_col (str) – Column name for p-values

  • beta_col (str) – Column name for effect sizes

  • pval_threshold (float) – P-value threshold for filtering

  • radius (Union[float, int]) – Genomic radius around lead SNP (in base pairs)

  • ld_window_kb (int) – LD window size in kilobases

  • genome_build (str) – Genome build version (‘37’ or ‘38’)

  • annotation_type (str) – Source for gene annotation (‘ensembl’, ‘refseq’, or ‘both’)

  • api_request (bool) – Whether to use API for functional annotation

  • fig_size (tuple) – Figure size (width, height) in inches

  • dpi (int) – Resolution for saved figure

  • threads (int) – Number of threads for PLINK

  • save_name (str) – Filename for saving the plot

Returns:

Path to saved figure

Return type:

Path

Usage Example:

import pandas as pd
from pathlib import Path
from ideal_genom.visualizations.zoom_heatmap import create_zoom_heatmap

# Load GWAS summary statistics
sumstats = pd.read_csv("gwas_results.txt", sep="\t")

# Create zoom heatmap around a lead SNP
create_zoom_heatmap(
    sumstats_df=sumstats,
    plink_file=Path("data/genotypes"),  # Without .bed/.bim/.fam extension
    lead_snp='rs12345',
    output_dir=Path("./plots"),
    snp_col='SNP',
    chr_col='CHR',
    pos_col='BP',
    p_col='P',
    beta_col='BETA',
    pval_threshold=5e-8,
    radius=500000,  # 500kb window
    genome_build='38',
    annotation_type='ensembl',
    api_request=True,
    save_name='rs12345_zoom.png'
)

Notes

Dependencies:
  • matplotlib

  • seaborn

  • pandas

  • numpy

  • textalloc (for label positioning)

  • pyensembl (for gene annotations)

  • PLINK 1.9 or 2.0 (for LD calculations)

Annotation Sources:

All plotting functions support gene annotation from:

  • Ensembl: Via REST API or local GTF files

  • RefSeq: Via local GTF files

  • Both: Combined annotations from both sources

Genome Builds:

Supported genome builds are GRCh37/hg19 (‘37’) and GRCh38/hg38 (‘38’)

Output Formats:
  • JPEG format for Manhattan, Miami, QQ, beta-beta, and trumpet plots

  • PNG format for zoom heatmaps (recommended for better quality with complex graphics)

  • All plots are publication-ready with customizable DPI

See Also