Development
tooluniverse-rnaseq-deseq2 - Claude MCP Skill
Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.
SEO Guide: Enhance your AI agent with the tooluniverse-rnaseq-deseq2 tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to production-ready rna-seq differential expression analysis using pydeseq2. performs deseq2 normalizat... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# RNA-seq Differential Expression Analysis (DESeq2) Differential expression analysis of RNA-seq count data using PyDESeq2, with enrichment analysis (gseapy) and gene annotation via ToolUniverse. **BixBench Coverage**: Validated on 53 BixBench questions across 15 computational biology projects. ## Core Principles 1. **Data-first** - Load and validate count data and metadata BEFORE any analysis 2. **Statistical rigor** - Proper normalization, dispersion estimation, multiple testing correction 3. **Flexible design** - Single-factor, multi-factor, and interaction designs 4. **Threshold awareness** - Apply user-specified thresholds exactly (padj, log2FC, baseMean) 5. **Reproducible** - Set random seeds, document all parameters 6. **Question-driven** - Parse what the user is actually asking; extract the specific answer 7. **Enrichment integration** - Chain DESeq2 results into pathway/GO enrichment when requested ## When to Use - RNA-seq count matrices needing differential expression analysis - DESeq2, DEGs, padj, log2FC questions - Dispersion estimates or diagnostics - GO, KEGG, Reactome enrichment on DEGs - Specific gene expression changes between conditions - Batch effect correction in RNA-seq ## Required Packages ```python import pandas as pd, numpy as np from pydeseq2.dds import DeseqDataSet from pydeseq2.ds import DeseqStats import gseapy as gp # enrichment (optional) from tooluniverse import ToolUniverse # annotation (optional) ``` ## Analysis Workflow ### Step 1: Parse the Question Extract: data files, thresholds (padj/log2FC/baseMean), design factors, contrast, direction, enrichment type, specific genes. See [question_parsing.md](references/question_parsing.md). ### Step 2: Load & Validate Data Load counts + metadata, ensure samples-as-rows/genes-as-columns, verify integer counts, align sample names, remove zero-count genes. See [data_loading.md](references/data_loading.md). ### Step 2.5: Inspect Metadata (REQUIRED) List ALL metadata columns and levels. Categorize as biological interest vs batch/block. Build design formula with covariates first, factor of interest last. See [design_formula_guide.md](references/design_formula_guide.md). ### Step 3: Run PyDESeq2 Set reference level via `pd.Categorical`, create `DeseqDataSet`, call `dds.deseq2()`, extract `DeseqStats` with contrast, run Wald test, optionally apply LFC shrinkage. See [pydeseq2_workflow.md](references/pydeseq2_workflow.md). **Tool boundaries**: - **Python (PyDESeq2)**: ALL DESeq2 analysis - **ToolUniverse**: ONLY gene annotation (ID conversion, pathway context) - **gseapy**: Enrichment analysis (GO/KEGG/Reactome) ### Step 4: Filter Results Apply padj, log2FC, baseMean thresholds. Split by direction if needed. See [result_filtering.md](references/result_filtering.md). ### Step 5: Dispersion Analysis (if asked) Key columns: `genewise_dispersions`, `fitted_dispersions`, `MAP_dispersions`, `dispersions`. See [dispersion_analysis.md](references/dispersion_analysis.md). ### Step 6: Enrichment (optional) Use gseapy `enrich()` with appropriate gene set library. See [enrichment_analysis.md](references/enrichment_analysis.md). ### Step 7: Gene Annotation (optional) Use ToolUniverse for ID conversion and gene context only. See [output_formatting.md](references/output_formatting.md). ## Common Patterns | Pattern | Type | Key Operation | |---------|------|---------------| | 1 | DEG count | `len(results[(padj<0.05) & (abs(lfc)>0.5)])` | | 2 | Gene value | `results.loc['GENE', 'log2FoldChange']` | | 3 | Direction | Filter `log2FoldChange > 0` or `< 0` | | 4 | Set ops | `degs_A - degs_B` for unique DEGs | | 5 | Dispersion | `(dds.var['genewise_dispersions'] < thr).sum()` | See [bixbench_examples.md](references/bixbench_examples.md) for all 10 patterns with examples. ## Error Quick Reference | Error | Fix | |-------|-----| | No matching samples | Transpose counts; strip whitespace | | Dispersion trend no converge | `fit_type='mean'` | | Contrast not found | Check `metadata['factor'].unique()` | | Non-integer counts | Round to int OR use t-test | | NaN in padj | Independent filtering removed genes | See [troubleshooting.md](references/troubleshooting.md) for full debugging guide. ## Known Limitations - **PyDESeq2 vs R DESeq2**: Numerical differences exist for very low dispersion genes (<1e-05). For exact R reproducibility, use rpy2. - **gseapy vs R clusterProfiler**: Results may differ. See [r_clusterprofiler_guide.md](references/r_clusterprofiler_guide.md). ## Reference Files - [question_parsing.md](references/question_parsing.md) - Extract parameters from questions - [data_loading.md](references/data_loading.md) - Data loading and validation - [design_formula_guide.md](references/design_formula_guide.md) - Multi-factor design decision tree - [pydeseq2_workflow.md](references/pydeseq2_workflow.md) - Complete PyDESeq2 code examples - [result_filtering.md](references/result_filtering.md) - Advanced filtering and extraction - [dispersion_analysis.md](references/dispersion_analysis.md) - Dispersion diagnostics - [enrichment_analysis.md](references/enrichment_analysis.md) - GO/KEGG/Reactome workflows - [output_formatting.md](references/output_formatting.md) - Format answers correctly - [bixbench_examples.md](references/bixbench_examples.md) - All 10 question patterns - [troubleshooting.md](references/troubleshooting.md) - Common issues and debugging - [r_clusterprofiler_guide.md](references/r_clusterprofiler_guide.md) - R clusterProfiler via rpy2 ## Utility Scripts - [format_deseq2_output.py](scripts/format_deseq2_output.py) - Output formatters - [load_count_matrix.py](scripts/load_count_matrix.py) - Data loading utilities
Signals
Information
- Repository
- mims-harvard/ToolUniverse
- Author
- mims-harvard
- Last Sync
- 3/12/2026
- Repo Updated
- 3/12/2026
- Created
- 2/19/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
CLAUDE
CLAUDE.md
fastmcp-client-cli
Query and invoke tools on MCP servers using fastmcp list and fastmcp call. Use when you need to discover what tools a server offers, call tools, or integrate MCP servers into workflows.
Related Guides
Mastering the Oracle CLI: A Complete Guide to the Claude Skill for Database Professionals
Learn how to use the oracle Claude skill. Complete guide with installation instructions and examples.
Python Django Best Practices: A Comprehensive Guide to the Claude Skill
Learn how to use the python django best practices Claude skill. Complete guide with installation instructions and examples.
Mastering Python and TypeScript Development with the Claude Skill Guide
Learn how to use the python typescript guide Claude skill. Complete guide with installation instructions and examples.