Visualization Modules

The ideal_genom.visualizations package provides functions for creating publication-ready plots for GWAS and genomic analysis.

Module Overview

manhattan_type

Generate Manhattan and Miami plots for genome-wide association studies (GWAS).

Module: ideal_genom.visualizations.manhattan_type

Features:

Data processing and visualization of GWAS summary statistics
Annotation of SNPs with gene information from various sources
Highlighting and labeling of specific SNPs of interest
Support for both Manhattan (single study) and Miami (two studies) plots

Key Functions:

compute_relative_pos(data, chr_col='CHR', pos_col='POS', p_col='p')

Compute the relative position of probes/SNPs across chromosomes and add a -log10(p-value) column.

Parameters:

data (pandas.DataFrame) – Input DataFrame containing genomic data
chr_col (str) – Column name for chromosome identifiers
pos_col (str) – Column name for base pair positions
p_col (str) – Column name for p-values

Returns:

DataFrame with added columns for relative positions and -log10(p-values)

Return type:

pandas.DataFrame

manhattan(df_gwas, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps=None, alpha=0.7, save_name='manhattan.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 6), dpi=500)

Generate a Manhattan plot from GWAS summary statistics.

Parameters:

df_gwas (pandas.DataFrame) – DataFrame containing GWAS results
plots_dir (str) – Directory path where the plot will be saved
pval_col (str) – Column name for p-values
chr_col (str) – Column name for chromosome
pos_col (str) – Column name for base pair position
snp_col (str) – Column name for SNP identifiers
p_threshold (float) – Genome-wide significance threshold
annotate (Optional[list]) – List of SNP IDs to annotate with gene names
annotation_type (str) – Source for gene annotation (‘ensembl’, ‘refseq’, or ‘both’)
genome_build (str) – Genome build version (‘37’ or ‘38’)
api_request (bool) – Whether to use API for annotation (if False, uses local GTF)
highlight_snps (Optional[list]) – List of SNP IDs to highlight in different color
alpha (float) – Transparency level for points
save_name (str) – Filename for saving the plot
colors (Optional[list]) – Custom colors for alternating chromosomes
chr_text_shift (Optional[float]) – Shift amount for chromosome labels
fig_size (tuple) – Figure size (width, height) in inches
dpi (int) – Resolution for saved figure

Returns:

True if successful

Return type:

bool

miami(df_gwas1, df_gwas2, plots_dir, pval_col='P', chr_col='CHR', pos_col='POS', snp_col='SNP', p_threshold=5e-8, annotate1=None, annotate2=None, annotation_type='ensembl', genome_build='38', api_request=True, highlight_snps1=None, highlight_snps2=None, alpha=0.7, save_name='miami.jpeg', colors=None, chr_text_shift=None, fig_size=(18, 12), dpi=500, plot1_label='Study 1', plot2_label='Study 2')

Generate a Miami plot (back-to-back Manhattan plots) comparing two GWAS studies.

Parameters:

df_gwas1 (pandas.DataFrame) – DataFrame containing GWAS results for first study
df_gwas2 (pandas.DataFrame) – DataFrame containing GWAS results for second study
plots_dir (str) – Directory path where the plot will be saved
plot1_label (str) – Label for the top plot
plot2_label (str) – Label for the bottom plot

Returns:

True if successful

Return type:

bool

Other parameters are the same as manhattan() function

Usage Example:

import pandas as pd
from ideal_genom.visualizations.manhattan_type import manhattan, miami

# Load GWAS summary statistics
gwas_df = pd.read_csv("gwas_results.txt", sep="\t")

# Generate Manhattan plot
manhattan(
    df_gwas=gwas_df,
    plots_dir="./plots",
    pval_col='P',
    chr_col='CHR',
    pos_col='BP',
    snp_col='SNP',
    p_threshold=5e-8,
    annotate=['rs12345', 'rs67890'],  # Annotate specific SNPs
    annotation_type='ensembl',
    genome_build='38',
    save_name='my_manhattan.jpeg'
)

# Generate Miami plot comparing two studies
gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t")
miami(
    df_gwas1=gwas_df,
    df_gwas2=gwas_df2,
    plots_dir="./plots",
    plot1_label='Discovery cohort',
    plot2_label='Replication cohort',
    save_name='my_miami.jpeg'
)

plots

Functions for generating various plots for GWAS data analysis.

Module: ideal_genom.visualizations.plots

Features:

QQ plots for visualizing the distribution of p-values
Beta-beta scatter plots for comparing effect sizes between studies
Trumpet plots for visualizing power and effect sizes
Support for both binary and quantitative traits

Key Functions:

qqplot_draw(df_gwas, plots_dir, lambda_val=None, pval_col='P', conf_color='lightgray', save_name='qq_plot.jpeg', fig_size=(10, 10), dpi=500)

Create a Q-Q (Quantile-Quantile) plot from GWAS results.

This function generates a Q-Q plot comparing observed vs expected -log10(p-values) from GWAS results, including confidence intervals and genomic inflation factor (λ).

Parameters:

df_gwas (pandas.DataFrame) – DataFrame containing GWAS results with p-values
plots_dir (str) – Directory path where the plot will be saved
lambda_val (Optional[float]) – Genomic inflation factor (calculated if None)
pval_col (str) – Column name for p-values
conf_color (str) – Color for confidence interval bands
save_name (str) – Filename for saving the plot
fig_size (tuple) – Figure size (width, height) in inches
dpi (int) – Resolution for saved figure

Returns:

True if successful

Return type:

bool

beta_beta_plot(df1, df2, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, save_name='beta_beta.jpeg', fig_size=(10, 10), dpi=500, x_label='Study 1', y_label='Study 2')

Create a beta-beta scatter plot comparing effect sizes between two GWAS studies.

Parameters:

df1 (pandas.DataFrame) – DataFrame containing GWAS results for first study
df2 (pandas.DataFrame) – DataFrame containing GWAS results for second study
plots_dir (str) – Directory path where the plot will be saved
beta_col (str) – Column name for effect sizes (beta)
se_col (str) – Column name for standard errors
snp_col (str) – Column name for SNP identifiers
pval_col (str) – Column name for p-values
p_threshold (float) – Significance threshold for highlighting SNPs
annotate (Optional[list]) – List of SNP IDs to annotate
annotation_type (str) – Source for gene annotation
genome_build (str) – Genome build version
api_request (bool) – Whether to use API for annotation
save_name (str) – Filename for saving the plot
fig_size (tuple) – Figure size (width, height) in inches
dpi (int) – Resolution for saved figure
x_label (str) – Label for x-axis
y_label (str) – Label for y-axis

Returns:

True if successful

Return type:

bool

trumpet_plot_binary(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_cases=None, n_controls=None, prevalence=0.5, alpha_val=0.05, save_name='trumpet_binary.jpeg', fig_size=(10, 10), dpi=500)

Create a trumpet plot for binary traits, showing power curves and effect sizes.

Parameters:

df_gwas (pandas.DataFrame) – DataFrame containing GWAS results
plots_dir (str) – Directory path where the plot will be saved
beta_col (str) – Column name for effect sizes
se_col (str) – Column name for standard errors
snp_col (str) – Column name for SNP identifiers
pval_col (str) – Column name for p-values
p_threshold (float) – Significance threshold
annotate (Optional[list]) – List of SNP IDs to annotate
maf_col (str) – Column name for minor allele frequency
n_cases (Optional[int]) – Number of cases in the study
n_controls (Optional[int]) – Number of controls in the study
prevalence (float) – Disease prevalence
alpha_val (float) – Significance level for power calculation
save_name (str) – Filename for saving the plot

Returns:

True if successful

Return type:

bool

trumpet_plot_quantitative(df_gwas, plots_dir, beta_col='BETA', se_col='SE', snp_col='SNP', pval_col='P', p_threshold=5e-8, annotate=None, annotation_type='ensembl', genome_build='38', api_request=True, maf_col='MAF', n_samples=None, alpha_val=0.05, save_name='trumpet_quantitative.jpeg', fig_size=(10, 10), dpi=500)

Create a trumpet plot for quantitative traits, showing power curves and effect sizes.

Parameters:

df_gwas (pandas.DataFrame) – DataFrame containing GWAS results
plots_dir (str) – Directory path where the plot will be saved
n_samples (Optional[int]) – Total number of samples in the study

Returns:

True if successful

Return type:

bool

Other parameters are the same as trumpet_plot_binary() function

Usage Example:

import pandas as pd
from ideal_genom.visualizations.plots import (
    qqplot_draw, beta_beta_plot,
    trumpet_plot_binary, trumpet_plot_quantitative
)

# Load GWAS results
gwas_df = pd.read_csv("gwas_results.txt", sep="\t")

# Generate QQ plot
qqplot_draw(
    df_gwas=gwas_df,
    plots_dir="./plots",
    pval_col='P',
    save_name='my_qq_plot.jpeg'
)

# Beta-beta plot comparing two studies
gwas_df2 = pd.read_csv("gwas_results2.txt", sep="\t")
beta_beta_plot(
    df1=gwas_df,
    df2=gwas_df2,
    plots_dir="./plots",
    beta_col='BETA',
    se_col='SE',
    x_label='Discovery',
    y_label='Replication',
    save_name='my_beta_beta.jpeg'
)

# Trumpet plot for binary trait
trumpet_plot_binary(
    df_gwas=gwas_df,
    plots_dir="./plots",
    n_cases=1000,
    n_controls=1000,
    prevalence=0.1,
    save_name='my_trumpet_binary.jpeg'
)

# Trumpet plot for quantitative trait
trumpet_plot_quantitative(
    df_gwas=gwas_df,
    plots_dir="./plots",
    n_samples=2000,
    save_name='my_trumpet_quant.jpeg'
)

zoom_heatmap

Create zoomed heatmap visualizations of SNP associations, gene annotations, and linkage disequilibrium (LD) patterns.

Module: ideal_genom.visualizations.zoom_heatmap

Features:

Filter and annotate SNP data in a genomic region
Calculate LD matrices using PLINK
Generate three-panel plots with:
1. Association plot with SNPs colored by functional consequences
2. Gene track showing gene locations and orientations
3. LD heatmap showing correlation patterns between SNPs

Key Functions:

filter_sumstats(data_df, lead_snp, snp_col, p_col, pos_col, chr_col, pval_threshold=5e-8, radius=10e6)

Filter GWAS summary statistics based on a lead SNP, p-value threshold and genomic region.

Parameters:

data_df (pandas.DataFrame) – DataFrame containing GWAS summary statistics
lead_snp (str) – Lead SNP identifier to center the region around
snp_col (str) – Column name for SNP identifiers
p_col (str) – Column name for p-values
pos_col (str) – Column name for base pair positions
chr_col (str) – Column name for chromosome
pval_threshold (float) – P-value threshold for filtering
radius (Union[float, int]) – Genomic radius around lead SNP (in base pairs)

Returns:

Filtered DataFrame

Return type:

pandas.DataFrame

compute_ld(plink_file, snp_list, output_dir, lead_snp=None, ld_window_kb=10000, ld_window_snps=10000, threads=1)

Compute linkage disequilibrium matrix for a list of SNPs using PLINK.

Parameters:

plink_file (Union[str, Path]) – Path to PLINK binary file prefix (without .bed/.bim/.fam)
snp_list (list) – List of SNP IDs for LD calculation
output_dir (Union[str, Path]) – Directory to save output files
lead_snp (Optional[str]) – Lead SNP for coloring (optional)
ld_window_kb (int) – LD window size in kilobases
ld_window_snps (int) – LD window size in number of SNPs
threads (int) – Number of threads for PLINK

Returns:

LD matrix as DataFrame

Return type:

pandas.DataFrame

create_zoom_heatmap(sumstats_df, plink_file, lead_snp, output_dir, snp_col='SNP', chr_col='CHR', pos_col='BP', p_col='P', beta_col='BETA', pval_threshold=5e-8, radius=500000, ld_window_kb=1000, genome_build='38', annotation_type='ensembl', api_request=True, fig_size=(14, 12), dpi=300, threads=1, save_name='zoom_heatmap.png')

Create a comprehensive zoom heatmap plot with association, gene track, and LD panels.

Parameters:

sumstats_df (pandas.DataFrame) – DataFrame containing GWAS summary statistics
plink_file (Union[str, Path]) – Path to PLINK binary file prefix
lead_snp (str) – Lead SNP identifier to center the plot
output_dir (Union[str, Path]) – Directory to save output files
snp_col (str) – Column name for SNP identifiers
chr_col (str) – Column name for chromosome
pos_col (str) – Column name for base pair position
p_col (str) – Column name for p-values
beta_col (str) – Column name for effect sizes
pval_threshold (float) – P-value threshold for filtering
radius (Union[float, int]) – Genomic radius around lead SNP (in base pairs)
ld_window_kb (int) – LD window size in kilobases
genome_build (str) – Genome build version (‘37’ or ‘38’)
annotation_type (str) – Source for gene annotation (‘ensembl’, ‘refseq’, or ‘both’)
api_request (bool) – Whether to use API for functional annotation
fig_size (tuple) – Figure size (width, height) in inches
dpi (int) – Resolution for saved figure
threads (int) – Number of threads for PLINK
save_name (str) – Filename for saving the plot

Returns:

Path to saved figure

Return type:

Path

Usage Example:

import pandas as pd
from pathlib import Path
from ideal_genom.visualizations.zoom_heatmap import create_zoom_heatmap

# Load GWAS summary statistics
sumstats = pd.read_csv("gwas_results.txt", sep="\t")

# Create zoom heatmap around a lead SNP
create_zoom_heatmap(
    sumstats_df=sumstats,
    plink_file=Path("data/genotypes"),  # Without .bed/.bim/.fam extension
    lead_snp='rs12345',
    output_dir=Path("./plots"),
    snp_col='SNP',
    chr_col='CHR',
    pos_col='BP',
    p_col='P',
    beta_col='BETA',
    pval_threshold=5e-8,
    radius=500000,  # 500kb window
    genome_build='38',
    annotation_type='ensembl',
    api_request=True,
    save_name='rs12345_zoom.png'
)

Notes

Dependencies:

matplotlib
seaborn
pandas
numpy
textalloc (for label positioning)
pyensembl (for gene annotations)
PLINK 1.9 or 2.0 (for LD calculations)

Annotation Sources:

All plotting functions support gene annotation from:

Ensembl: Via REST API or local GTF files
RefSeq: Via local GTF files
Both: Combined annotations from both sources

Genome Builds:

Supported genome builds are GRCh37/hg19 (‘37’) and GRCh38/hg38 (‘38’)

Output Formats:

JPEG format for Manhattan, Miami, QQ, beta-beta, and trumpet plots
PNG format for zoom heatmaps (recommended for better quality with complex graphics)
All plots are publication-ready with customizable DPI

Visualization Modules

Module Overview

manhattan_type

Features:

Key Functions:

Usage Example:

plots

Features:

Key Functions:

Usage Example:

zoom_heatmap

Features:

Key Functions:

Usage Example:

Notes

See Also