Testing
tooluniverse-epigenomics - Claude MCP Skill
Production-ready genomics and epigenomics data processing for BixBench questions. Handles methylation array analysis (CpG filtering, differential methylation, age-related CpG detection, chromosome-level density), ChIP-seq peak analysis (peak calling, motif enrichment, coverage stats), ATAC-seq chromatin accessibility, multi-omics integration (expression + methylation correlation), and genome-wide statistics. Pure Python computation (pandas, scipy, numpy, pysam, statsmodels) plus ToolUniverse annotation tools (Ensembl, ENCODE, SCREEN, JASPAR, ReMap, RegulomeDB, ChIPAtlas). Supports BED, BigWig, methylation beta-value matrices, Illumina manifest files, and multi-sample clinical data. Use when processing methylation data, ChIP-seq peaks, ATAC-seq signals, or answering questions about CpG sites, differential methylation, chromatin accessibility, histone marks, or epigenomic statistics.
SEO Guide: Enhance your AI agent with the tooluniverse-epigenomics tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to production-ready genomics and epigenomics data processing for bixbench questions. handles methylatio... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# Genomics and Epigenomics Data Processing Production-ready computational skill for processing and analyzing epigenomics data. Combines local Python computation (pandas, scipy, numpy, pysam, statsmodels) with ToolUniverse annotation tools for regulatory context. Designed to solve BixBench-style questions about methylation, ChIP-seq, ATAC-seq, and multi-omics integration. ## When to Use This Skill **Triggers**: - User provides methylation data (beta-value matrices, Illumina arrays) and asks about CpG sites - Questions about differential methylation analysis - Age-related CpG detection or epigenetic clock questions - Chromosome-level methylation density or statistics - ChIP-seq peak files (BED format) with analysis questions - ATAC-seq chromatin accessibility questions - Multi-omics integration (expression + methylation, expression + ChIP-seq) - Genome-wide epigenomic statistics - Questions mentioning "methylation", "CpG", "ChIP-seq", "ATAC-seq", "histone", "chromatin", "epigenetic" - Questions about missing data across clinical/genomic/epigenomic modalities - Regulatory element annotation for processed epigenomic data **Example Questions**: 1. "How many patients have no missing data for vital status, gene expression, and methylation data?" 2. "What is the ratio of filtered age-related CpG density between chromosomes?" 3. "What is the genome-wide average chromosomal density of unique age-related CpGs per base pair?" 4. "How many CpG sites show significant differential methylation (padj < 0.05)?" 5. "What is the Pearson correlation between methylation and expression for gene X?" 6. "How many ChIP-seq peaks overlap with promoter regions?" 7. "What fraction of ATAC-seq peaks are in enhancer regions?" 8. "Which chromosome has the highest density of hypermethylated CpGs?" 9. "Filter CpG sites by variance > threshold and map to nearest genes" 10. "What is the average beta value difference between tumor and normal for chromosome 17?" **NOT for** (use other skills instead): - Gene regulation lookup without data files -> Use existing epigenomics annotation pattern - RNA-seq differential expression -> Use `tooluniverse-rnaseq-deseq2` - Variant calling/annotation from VCF -> Use `tooluniverse-variant-analysis` - Gene enrichment analysis -> Use `tooluniverse-gene-enrichment` - Protein structure analysis -> Use `tooluniverse-protein-structure-retrieval` --- ## Required Python Packages ```python # Core (MUST be available) import pandas as pd import numpy as np from scipy import stats import statsmodels.stats.multitest as mt # Optional but useful import pysam # BAM/CRAM file access import gseapy # Enrichment of genes from methylation analysis # ToolUniverse (for annotation) from tooluniverse import ToolUniverse ``` --- ## Key Principles 1. **Data-first approach** - Load and inspect data files BEFORE any analysis 2. **Question-driven** - Parse what the user is actually asking and extract the specific numeric answer 3. **File format detection** - Automatically detect methylation arrays, BED files, BigWig, clinical data 4. **Coordinate system awareness** - Track genome build (hg19, hg38, mm10), handle chr prefix differences 5. **Statistical rigor** - Proper multiple testing correction, effect size filtering, sample size awareness 6. **Missing data handling** - Explicitly report and handle NaN/missing values 7. **Chromosome normalization** - Always normalize chromosome names (chr1 vs 1, chrX vs X) 8. **CpG site identification** - Parse Illumina probe IDs (cg/ch probes), genomic coordinates 9. **Report-first** - Create output file first, populate progressively 10. **English-first queries** - Use English in all tool calls --- ## Workflow Overview ### Phase 0: Question Parsing and Data Discovery Before writing any code, parse the question to identify: - What data files are available (methylation, ChIP-seq, ATAC-seq, clinical, expression, manifest) - What specific statistic or answer is being asked for - What thresholds apply (significance, effect size, variance, chromosome filters) - What genome build to use Categorize files by scanning for keywords: methyl/beta/cpg/illumina, chip/peak/narrowpeak, atac/accessibility, clinical/patient/sample, express/rnaseq/fpkm, manifest/annotation/probe. See `ANALYSIS_PROCEDURES.md` for the full decision tree and parameter extraction table. ### Phase 1: Methylation Data Processing Core functions for methylation analysis: - **Load methylation data** - Supports CSV, TSV, parquet, HDF5; auto-detects beta vs M-values - **Load probe manifest** - Illumina 450K/EPIC manifest with chromosome, position, gene annotation - **CpG filtering** - Filter by variance, missing rate, probe type (cg/ch), chromosome, CpG island relation, gene group - **Differential methylation** - T-test/Wilcoxon/KS between groups with FDR correction; identify DMPs (hyper/hypo) - **Age-related CpG analysis** - Pearson/Spearman correlation of probes with age, FDR correction - **Chromosome-level density** - CpG count per chromosome divided by chromosome length; density ratios; genome-wide average See `CODE_REFERENCE.md` Phase 1 for full function implementations. ### Phase 2: ChIP-seq Peak Analysis - **Load BED/narrowPeak/broadPeak** - Auto-detect format, normalize chromosomes - **Peak statistics** - Count, length distribution, signal values, q-values - **Peak annotation** - Map peaks to nearest gene, classify as promoter/gene_body/proximal/distal - **Peak overlap** - Pure Python interval intersection between two BED files; Jaccard similarity See `CODE_REFERENCE.md` Phase 2 for full function implementations. ### Phase 3: ATAC-seq Analysis - **Load ATAC peaks** - Wrapper around BED loader for narrowPeak format - **ATAC-specific stats** - Nucleosome-free region (NFR) detection (<150bp peaks), region classification - **Chromatin accessibility by region** - Distribution of open chromatin across promoter/enhancer/intergenic See `CODE_REFERENCE.md` Phase 3 for full function implementations. ### Phase 4: Multi-Omics Integration - **Methylation-expression correlation** - Align samples, compute per-probe-gene Pearson/Spearman with FDR - **ChIP-seq + expression** - Find genes with promoter peaks and compare expression levels See `CODE_REFERENCE.md` Phase 4 for full function implementations. ### Phase 5: Clinical Data Integration - **Missing data analysis** - Count samples present across clinical, expression, and methylation modalities - **Complete case identification** - Find intersection of samples with non-missing values for specified variables See `CODE_REFERENCE.md` Phase 5 for full function implementations. ### Phase 6: ToolUniverse Annotation Use ToolUniverse tools to add biological context after computational analysis: - **Gene annotation** - Ensembl lookup for coordinates, biotype, cross-references - **Regulatory elements** - SCREEN cCREs (enhancers, promoters, insulators) - **ChIPAtlas** - Query available ChIP-seq experiments by antigen/cell type - **Ensembl regulatory features** - Annotate genomic regions with regulatory overlaps See `CODE_REFERENCE.md` Phase 6 and `TOOLS_REFERENCE.md` for parameters. ### Phase 7: Genome-Wide Statistics - **Comprehensive methylation stats** - Global mean/median beta, probe variance, chromosome density - **Differential methylation summary** - Count significant, hyper/hypo split, effect sizes See `CODE_REFERENCE.md` Phase 7 for full function implementations. --- ## Common Analysis Patterns | Pattern | Input | Key Steps | Output | |---------|-------|-----------|--------| | Differential methylation | Beta matrix + clinical | Filter probes -> define groups -> t-test -> FDR -> threshold | Count of significant DMPs | | Age-related CpG density | Beta matrix + manifest + ages | Correlate with age -> FDR -> map to chr -> density per chr | Density ratio between chromosomes | | Multi-omics missing data | Clinical + expression + methylation | Extract sample IDs -> intersect -> check NaN | Complete case count | | ChIP-seq peak annotation | BED/narrowPeak + gene annotation | Load peaks -> annotate to genes -> classify regions | Fraction in promoters | | Methylation-expression | Beta matrix + expression + probe-gene map | Align samples -> correlate -> FDR | Significant anti-correlations | See `ANALYSIS_PROCEDURES.md` for detailed step-by-step flows and edge case handling. --- ## Key Functions Reference | Function | Purpose | Input | Output | |----------|---------|-------|--------| | `load_methylation_data()` | Load beta/M-value matrix | file path | DataFrame | | `detect_methylation_type()` | Detect beta vs M-values | DataFrame | 'beta' or 'mvalue' | | `filter_cpg_probes()` | Filter probes by criteria | DataFrame + filters | filtered DataFrame | | `differential_methylation()` | DM analysis between groups | beta + samples | DataFrame with padj | | `identify_age_related_cpgs()` | Age-correlated CpGs | beta + ages | DataFrame with padj | | `chromosome_cpg_density()` | CpG density per chromosome | probes + manifest | density DataFrame | | `genome_wide_average_density()` | Overall genome density | density DataFrame | float | | `chromosome_density_ratio()` | Ratio between chromosomes | density + chr names | float | | `load_bed_file()` | Load BED/narrowPeak | file path | DataFrame | | `peak_statistics()` | Basic peak stats | BED DataFrame | dict | | `annotate_peaks_to_genes()` | Annotate peaks to genes | peaks + genes | annotated DataFrame | | `find_overlaps()` | Peak overlap analysis | two BED DataFrames | overlap DataFrame | | `missing_data_analysis()` | Cross-modality completeness | multiple DataFrames | dict | | `correlate_methylation_expression()` | Meth-expression correlation | beta + expression | correlation DataFrame | --- ## ToolUniverse Tools Used ### Regulatory Annotation Tools - `ensembl_lookup_gene` - Gene coordinates, biotype (REQUIRES `species='homo_sapiens'`) - `ensembl_get_regulatory_features` - Regulatory features by region (NO "chr" prefix in region) - `ensembl_get_overlap_features` - Gene/transcript overlap data - `SCREEN_get_regulatory_elements` - cCREs: enhancers, promoters, insulators - `ReMap_get_transcription_factor_binding` - TF binding sites - `RegulomeDB_query_variant` - Variant regulatory score - `jaspar_search_matrices` - TF binding matrices - `ENCODE_search_experiments` - Experiment metadata (assay_title must be "TF ChIP-seq" not "ChIP-seq") - `ChIPAtlas_get_experiments` - ChIP-seq experiments (REQUIRES `operation` param) - `ChIPAtlas_search_datasets` - Dataset search (REQUIRES `operation` param) - `ChIPAtlas_enrichment_analysis` - Enrichment from BED/motifs/genes - `ChIPAtlas_get_peak_data` - Peak data download (REQUIRES `operation` param) - `FourDN_search_data` - Chromatin conformation data (REQUIRES `operation` param) ### Gene Annotation Tools - `MyGene_query_genes` - Gene query - `MyGene_batch_query` - Batch gene query - `HGNC_get_gene_info` - Gene symbol, aliases, IDs - `GO_get_annotations_for_gene` - GO annotations See `TOOLS_REFERENCE.md` for full parameter details and return schemas. --- ## Data Format Notes - **Methylation data**: Probes (rows) x samples (columns), beta values 0-1 - **BED files**: Tab-separated, 0-based half-open coordinates - **narrowPeak**: 10-column BED extension with signalValue, pValue, qValue, peak - **Illumina manifests**: Probe ID, chromosome, position, gene annotation - **Clinical data**: Patient/sample-centric with clinical variables as columns ## Genome Builds Supported | Build | Species | Autosomes | Sex Chromosomes | |-------|---------|-----------|-----------------| | hg38 (GRCh38) | Human | chr1-chr22 | chrX, chrY | | hg19 (GRCh37) | Human | chr1-chr22 | chrX, chrY | | mm10 (GRCm38) | Mouse | chr1-chr19 | chrX, chrY | ## Limitations - No native pybedtools: uses pure Python interval operations - No native pyBigWig: cannot read BigWig files directly without package - No R bridge: does not use methylKit, ChIPseeker, or DiffBind - Illumina-centric: methylation functions designed for 450K/EPIC arrays - Uses t-test/Wilcoxon for differential methylation (not limma/bumphunter) - No peak calling: assumes peaks are pre-called - API rate limits: ToolUniverse annotation limited to ~20 genes per batch --- ## Reference Files - `CODE_REFERENCE.md` - Full Python function implementations for all phases - `TOOLS_REFERENCE.md` - ToolUniverse tool parameter details and return schemas - `ANALYSIS_PROCEDURES.md` - Decision trees, step-by-step analysis patterns, edge cases, fallback strategies - `QUICK_START.md` - Quick start examples for common analysis types
Signals
Information
- Repository
- mims-harvard/ToolUniverse
- Author
- mims-harvard
- Last Sync
- 3/12/2026
- Repo Updated
- 3/12/2026
- Created
- 2/16/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.