Data & AI

tooluniverse population genetics - Claude MCP Skill

Skill: Population Genetics Analysis

SEO Guide: Enhance your AI agent with the tooluniverse population genetics tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to skill: population genetics analysis... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟11 stars • 205 forks

📥0 downloads

Python REST

View on GitHub🔗 Claude Servers

Documentation

SKILL.md

# Skill: Population Genetics Analysis

**MC Strategy**: Population genetics MC questions often test whether you know a specific theorem or result. COMPUTE the answer first (use popgen_calculator.py or write Python), then match to options. Don't try to reason about which option "sounds right."

Analyze population-level genetic variation, allele frequencies, GWAS associations, clinical significance, and evolutionary constraints using ToolUniverse tools.

## When to Use

Activate this skill when the user asks about:
- Allele frequencies across populations (gnomAD, 1000 Genomes)
- GWAS associations for diseases/traits
- Clinical variant interpretation (ClinVar, VEP)
- Gene-level constraint metrics (pLI, LOEUF, o/e ratios)
- Selection, drift, linkage disequilibrium, or population structure
- Variant annotation and functional consequences

## LOOK UP, DON'T GUESS

Query gnomAD/1000Genomes/GWAS Catalog FIRST for allele frequencies and associations. Preferred: use the `PopGen_hwe_test`, `PopGen_fst`, `PopGen_inbreeding`, and `PopGen_haplotype_count` tools for HWE, Fst, inbreeding, and haplotype calculations. Fallback: run `popgen_calculator.py` directly. For theoretical problems (delta-q, drift, LD decay), apply the formulas in the Theoretical Reasoning section below.

---

## Tool Quick Reference

| Tool | Key Parameters | Notes |
|------|---------------|-------|
| `gnomad_search_variants` | `query` (REQUIRED) | Resolve rsID to variant_id format "CHR-POS-REF-ALT" |
| `gnomad_get_variant` | `variant_id` (REQUIRED), `dataset` | Population frequencies. Default dataset: gnomad_r3; use gnomad_r4 for latest |
| `gnomad_get_gene_constraints` | `gene_symbol` (REQUIRED) | pLI, o/e ratios. May timeout -- retry once |
| `MyVariant_query_variants` | `query` (REQUIRED) | Aggregated: ClinVar + dbSNP + gnomAD + CADD. Uses hg19 coordinates |
| `EnsemblVEP_annotate_rsid` | `variant_id` (REQUIRED) | Functional consequence, SIFT, PolyPhen. Param is "variant_id" NOT "rsid" |
| `EnsemblVEP_variant_recoder` | `variant_id` (REQUIRED) | Convert between rsID/HGVS/VCF/SPDI |
| `gwas_get_snps_for_gene` | `gene_symbol` (REQUIRED) | All GWAS SNPs for a gene |
| `gwas_search_associations` | `query` (REQUIRED) | GWAS for a disease/trait (NOT gene name -- use gwas_get_snps_for_gene for genes) |
| `gwas_get_variants_for_trait` | `trait` (REQUIRED) | Variants associated with a trait |
| `ClinVar_search_variants` | `gene`, `condition`, `significance` | At least one filter required |
| `RegulomeDB_query_variant` | `rsid` (REQUIRED) | Regulatory scoring (1a=strongest to 7=minimal) |

### Critical Gotchas

1. **gnomAD variant_id**: Format is `"CHR-POS-REF-ALT"` (no "chr" prefix). Always resolve rsIDs via `gnomad_search_variants` first.
2. **gwas_search_associations**: Takes disease/trait names ONLY. Gene names will fail. Use `gwas_get_snps_for_gene` for gene-based lookups.
3. **gwas_search_snps**: BROKEN (HTTP 500). Use `gwas_get_snps_for_gene` instead.
4. **VEP/ClinVar responses**: Format is variable (list, `{data, metadata}`, or `{error}`). Handle all three.

---

## Workflow Patterns

**Variant frequency**: `gnomad_search_variants` -> `gnomad_get_variant(dataset="gnomad_r4")` -> `MyVariant_query_variants` (1000G pop breakdowns) -> `EnsemblVEP_annotate_rsid`

**GWAS for disease**: `gwas_search_associations` -> `gwas_get_variants_for_trait` -> `gnomad_get_variant` for top hits -> `EuropePMC_search_articles`

**Gene characterization**: `gnomad_get_gene_constraints` -> `gwas_get_snps_for_gene` -> `ClinVar_search_variants` -> `PubMed_search_articles`

**Pathogenicity assessment**: `EnsemblVEP_annotate_rsid` -> `MyVariant_query_variants` (CADD, ClinVar) -> `gnomad_get_variant` (frequency) -> `RegulomeDB_query_variant` (if non-coding)

---

## Theoretical Reasoning (CRITICAL for computation problems)

These formulas are needed for quantitative population genetics problems. Work through step by step, showing intermediate values.

### Allele Frequency Change Under Selection (delta-q)

For a recessive deleterious allele (fitness: AA=1, Aa=1, aa=1-s):
```
delta_q = -s * q^2 * p / (1 - s * q^2)
```
where p = freq(A), q = freq(a), s = selection coefficient.

For dominant deleterious (AA=1, Aa=1-s, aa=1-s):
```
delta_q = -s * q * p / (1 - s * q * (2 - q))
```

For heterozygote advantage (AA=1-s1, Aa=1, aa=1-s2):
```
equilibrium: q_hat = s1 / (s1 + s2)
```
Example: plug in s1 and s2 from the question; q_hat = s1/(s1+s2).

**Selection against recessives is slow at low q** because most a alleles hide in heterozygotes. Time to reduce q from q0 to qt: t ~ (1/qt - 1/q0) / s generations.

### Genetic Drift in Small Populations

**Variance in allele frequency per generation**: Var(delta_p) = p*q / (2*Ne)

**Probability of fixation** of a new neutral mutation: 1/(2*Ne)

**Time to fixation** (given it fixes): ~4*Ne generations for neutral alleles

**Heterozygosity decay**: H_t = H_0 * (1 - 1/(2*Ne))^t

After t generations, fraction of heterozygosity lost ~ 1 - e^(-t/(2*Ne))

**Effective population size (Ne)** adjustments:
- Unequal sex ratio: Ne = 4*Nf*Nm / (Nf + Nm)
- Fluctuating size: Ne = harmonic mean of N across generations
- Bottleneck: dominated by the smallest generation size

**Drift vs selection**: Drift dominates when |s| < 1/(2*Ne). A variant with s=0.01 behaves neutrally in a population of Ne < 50.

### Linkage Disequilibrium (LD) Decay

**D** = freq(AB) - freq(A)*freq(B), where A and B are alleles at two loci.

**Decay with recombination**: D_t = D_0 * (1 - r)^t, where r = recombination fraction, t = generations.

**Half-life of LD**: t_half = -ln(2) / ln(1-r) ~ 0.693/r generations (for small r).

**r-squared** (normalized LD): r^2 = D^2 / (pA * pa * pB * pb). Range 0-1.

**Expected r^2 in finite population at equilibrium**: E[r^2] = 1 / (1 + 4*Ne*r) (for drift-recombination balance).

**Practical implications**:
- Tightly linked loci (r < 0.01): LD persists for hundreds of generations
- Loosely linked (r = 0.5, independent assortment): LD halves every generation
- GWAS tag SNPs work because LD extends over blocks; block size depends on Ne and recombination rate
- African populations have shorter LD blocks (larger historical Ne) -> need denser SNP arrays

### Hardy-Weinberg Equilibrium

For alleles A (freq p) and a (freq q=1-p): expected genotypes AA=p^2, Aa=2pq, aa=q^2.

**Chi-square test**: df=1 (2 alleles). Preferred: use `PopGen_hwe_test` tool. Fallback: `popgen_calculator.py --type hwe --AA N1 --Aa N2 --aa N3`.

**Causes of HWE departure**: non-random mating, selection, migration, drift, genotyping error. Excess homozygotes -> inbreeding or population structure (Wahlund effect). Excess heterozygotes -> overdominant selection or negative assortative mating.

### Heritability

- **H^2 (broad-sense)** = V_G / V_P; **h^2 (narrow-sense)** = V_A / V_P
- V_G includes ALL genetic variance: additive + dominance + epistasis. Trap: "broad-sense" is not just additive.
- Under HWE with two alleles (p, q): genotype frequencies are p^2, 2pq, q^2
- Phenotype frequency from genotype: sum(genotype_freq * penetrance) for each genotype class
- For quantitative traits: V_P = V_G + V_E (no covariance assumed)
- With dominance: assign genotypic values (e.g., AA=a, Aa=d, aa=-a), compute mean, then V_G from freq-weighted squared deviations
- **PGS vs SNP-h² trap**: PGS R² is NOT necessarily ≤ h²_SNP. With large GWAS, PGS can exceed SNP-h² by tagging rare causal variants through LD with common SNPs. The word "necessarily" makes this claim False. h²_SNP is estimated from common variants; PGS can capture additional variance.

### Path Analysis (Causal Diagrams)

- Trace ALL paths from cause to effect through the diagram (direct + indirect)
- Each path's contribution = product of path coefficients along that path
- Total effect (correlation) = sum of contributions from all paths
- Indirect effects can mask (suppression) or inflate (confounding) the direct effect
- Unanalyzed correlations (double-headed arrows) count as valid path segments
- **Never ignore indirect paths** — the total is rarely just the direct arrow

### Genetic Combinatorics (F2 crosses, haplotype counting)

For n SNPs between two inbred (homozygous) strains:
- F1 is heterozygous at all n loci
- F2 distinct haplotypes = 2^n (each SNP contributes parental A or B allele)
- F2 distinct diploid genotypes = 3^n (AA, AB, BB at each locus)
- F2 unique chromosomes (distinct haplotypes) = 2^n (e.g., 5 SNPs → 2^5 = 32; but subtract the 2 parental haplotypes if "novel" is asked → 30)
- **ALWAYS write and run Python code** (`python3 -c "..."`) for these counts. Never enumerate by hand.
- For specimens/counting from field data: parse the data into a structure and compute programmatically.

### Mutation-Selection Balance

Equilibrium frequency of a deleterious allele:
- Recessive lethal: q_hat = sqrt(mu/s) ~ sqrt(mu) when s=1
- Dominant lethal: q_hat = mu/s
- Example: mu=1e-5, s=1 (recessive lethal) -> q_hat = 0.003 (carrier freq ~ 0.006)

### F-statistics and Population Structure

- **Fis**: Inbreeding within subpopulations (heterozygote deficit within demes)
- **Fst**: Differentiation between subpopulations. Fst = Var(p) / (p_bar * q_bar)
- **Fit**: Total inbreeding. (1-Fit) = (1-Fis)(1-Fst)
- Fst interpretation: <0.05 little, 0.05-0.15 moderate, 0.15-0.25 great, >0.25 very great differentiation
- Preferred: use `PopGen_fst` tool. Fallback: `popgen_calculator.py --type fst --p1 X --p2 Y --n1 N1 --n2 N2`

---

## Mendelian Genetics Reasoning Framework

For any genetics cross problem, follow these steps IN ORDER. Do not skip steps.

### Step 1: Identify genes, locations, and allele relationships
- List every gene involved in the cross
- Determine chromosomal location: autosomal vs X-linked (X-linked genes show different inheritance in males vs females)
- Determine allele relationships: dominant/recessive, codominant, incomplete dominance
- Note any epistasis, suppressor, or modifier interactions between genes

### Step 2: Write parental genotypes explicitly
- Use standard notation (e.g., Aa Bb for autosomal; X^w X^+ for X-linked)
- For X-linked genes, males are hemizygous (X^w Y), not homozygous
- If parental genotypes are not given, deduce them from phenotypes and pedigree context

### Step 3: Draw Punnett square(s) for each gene
- For multi-gene crosses, handle each gene independently (if unlinked) then combine
- For linked genes, use recombination frequency to adjust gamete ratios
- For X-linked genes, remember that fathers pass X to all daughters and Y to all sons

### Step 4: Calculate expected phenotypic ratios
- Multiply independent gene ratios (e.g., 3:1 x 3:1 = 9:3:3:1)
- For X-linked: calculate male and female ratios separately, then combine or report separately as required

### Step 5: Verify ratios sum to 1.0
- Convert all ratios to fractions and confirm they sum to 1
- If they don't sum to 1, there is an error in the Punnett square or gamete calculation

### Step 6: Apply phenotype modification rules AFTER computing genotypic ratios
- For epistasis: first compute the full genotypic ratios (e.g., 9:3:3:1), then collapse genotype classes that produce the same phenotype
- For suppressor genes: a suppressor homozygote (su/su) restores wild-type in an otherwise mutant background. Apply suppression AFTER determining which individuals carry the mutant allele
- Example: 9 A_B_ : 3 A_bb : 3 aaB_ : 1 aabb with recessive epistasis (aa masks B) becomes 9:3:4

---

## E. coli Hfr Mapping Framework

For bacterial conjugation and Hfr mapping problems:

### Core Principles
- In Hfr x F- crosses, the Hfr chromosome is transferred linearly starting from the origin of transfer (oriT)
- **Gene transfer order = chromosomal order from the origin**
- Early markers (entering first) are closest to the origin of transfer
- Late markers (entering last) are farthest from the origin

### Interrupted Mating Experiments
- Genes that appear in recombinants at earlier time points are closer to oriT
- The time of entry gives the order and approximate distance between genes
- Recombinants require integration by homologous recombination (double crossover)

### Recombination Frequency Between Markers
- **KEY TRAP**: Highest recombination frequency occurs between markers that are FARTHEST APART on the transferred segment
- This is because more time elapses between entry of distant markers, providing more opportunity for recombination events between them
- Conversely, markers that enter close together in time show LOW recombination between them
- Do NOT confuse "highest recombination frequency" with "first markers to enter" -- these are opposite concepts

### Ordering Markers from Hfr Data
1. Use time-of-entry data to establish gene order relative to oriT
2. Use recombination frequency data between pairs of selected markers to confirm/refine order
3. Multiple Hfr strains with different origins can be used to build a circular map

---

## MCQ Elimination Strategy for Genetics

### General MCQ Protocol
1. **ALWAYS evaluate ALL options** before choosing an answer
2. Never select the first option that seems correct -- there may be a better or more precise answer
3. Read the question stem carefully for qualifiers: "MOST likely", "LEAST likely", "NOT true", "ALWAYS", "NEVER"

### "Which is NOT true" Questions
- Evaluate EACH statement independently as True or False
- Mark each option with T or F before selecting
- The answer is the statement marked F
- Double-check: verify the "false" statement is genuinely false, not just misleadingly worded

### "Which mechanism" Questions
- Test each proposed mechanism against ALL observations given in the question
- A correct mechanism must explain every observation, not just some
- Eliminate mechanisms that contradict even one observation

### Specific Traps to Watch For
- **Subfunctionalization vs neofunctionalization**: Subfunctionalization = partitioning of EXISTING ancestral functions between duplicates (both copies needed to perform original function). Neofunctionalization = one copy acquires a genuinely NEW function not present in the ancestor
- **Copy-neutral LOH**: Caused by mitotic recombination (segmental, affects part of a chromosome), NOT uniparental disomy (UPD, which is whole-chromosome). The question may try to conflate these
- **Penetrance vs expressivity**: Penetrance = fraction of individuals with genotype who show ANY phenotype. Expressivity = degree/severity of phenotype among those who show it. These are distinct concepts
- **Complementation vs recombination**: Complementation = two mutations in DIFFERENT genes restore wild-type in trans. Recombination = exchange between two mutations in the SAME or different genes. Complementation is tested in F1 (heterozygote); recombination is tested in progeny

---

## Common Genetics Reasoning Traps

These are specific patterns that have caused reasoning failures in hard genetics questions. Review before answering genetics MCQs.

### Suppressor Genetics
- A suppressor mutation, when homozygous, restores wild-type phenotype in an otherwise mutant background
- In F2 crosses involving both the original mutation and an autosomal recessive suppressor:
  - Treat as a dihybrid cross — the primary mutation and the suppressor segregate independently
  - Only 1/4 of F2 are homozygous for the suppressor
  - The suppressor only acts in individuals that are also homozygous for the primary mutation
  - Use a Punnett square to enumerate all genotypic classes, then apply the suppression rule to determine phenotypes

### Non-disjunction (Bridges' Experiments)
- Bridges used non-disjunction to prove the chromosome theory of inheritance
- X0 males arise from female meiosis non-disjunction events
- **Meiosis I non-disjunction**: both X chromosomes go to one pole -> XX egg + O egg (nullo-X)
- **Meiosis II non-disjunction**: sister chromatids fail to separate -> XX egg from one secondary oocyte
- The classic Bridges result: exceptional white-eyed females (X^w X^w) and red-eyed males (from nullo-X eggs + Y sperm = X0, but these are typically sterile)
- Key distinction: know which type of non-disjunction (MI vs MII) produces which specific gamete types

### GWAS LD Blocks
- SNPs WITHIN the same LD block are correlated and can inflate false positive associations (one causal SNP drags along non-causal tag SNPs)
- SNPs ACROSS different LD blocks are largely independent and do NOT create misleading cross-locus associations
- LD block structure varies by population (shorter in African populations due to larger historical Ne)
- Fine-mapping within an LD block is needed to distinguish the causal variant from hitchhiking tag SNPs

### Gene Retention After Whole-Genome Duplication
- **Neofunctionalization**: One copy acquires a NEW function -> most commonly cited reason for gene RETENTION after duplication (preserves both copies because each is now essential)
- **Subfunctionalization**: Ancestral functions are PARTITIONED between copies -> explains DIVERGENCE of duplicate copies, but both copies must be retained to maintain the full ancestral function
- **Dosage balance**: Some genes are retained in duplicate to maintain stoichiometric balance in protein complexes
- Trap: Questions may ask "what explains retention" vs "what explains divergence" -- these have different best answers
- For retention: neofunctionalization (new function makes both copies essential)
- For divergence of expression/function: subfunctionalization (partitioning of ancestral roles)

---

## Advanced Genetics Traps v2

### PGS vs Heritability: "Necessarily True" Logic
For "necessarily true" questions about PGS and heritability: a statement is necessarily true only if it holds when V_D=0 AND when V_D=V_G. Test the extremes.

### Path Diagram Sign Assignment Protocol
Do NOT guess path signs from general knowledge. Signs may differ from well-known systems. Follow this protocol:
1. **Establish reference direction**: What varies? What is increasing?
2. **For each path X→Y**: Ask ONLY "when X increases, does Y increase (+) or decrease (-)?"
3. **Use the question's experimental context** (knockout/control comparisons, provided data) to determine signs — not intuition
4. **Expect negative paths**: Path diagrams test your ability to identify negative relationships. All-positive is almost always wrong. Direct residual paths (e) often have opposite sign from expectation.

### Chi-Square: "Most Likely to Reject" Protocol
Compute chi-square from the expected ratio given in the question. Compare to chi-square-critical at df = (number of phenotype classes - 1). Pick the answer choice with the highest chi-square, but also check which pattern is biologically diagnostic of the alternative hypothesis.

### LD and Misleading GWAS Associations
LD block boundaries at recombination hotspots are a source of GWAS false localization — strong signal in the block does not guarantee the causal variant is in the block.

### Low-Frequency Allele Detection
Duplex sequencing (unique molecular identifiers + double-strand consensus) detects alleles at 0.01% frequency — far below standard NGS even at 80X depth. Simply increasing read depth does NOT help for ultra-rare variants because the Illumina error rate (~0.1%) masks variants rarer than ~1% regardless of depth. Error correction methods (UMIs, duplex consensus) are needed to distinguish true rare variants from sequencing errors.

---

## Bundled Computation Script

**Script**: `skills/tooluniverse-population-genetics/scripts/popgen_calculator.py`

**Preferred**: Use ToolUniverse tools (via MCP/SDK) instead of the script when possible:
- `PopGen_hwe_test` tool -- HWE chi-square test. Fallback: `popgen_calculator.py --type hwe`
- `PopGen_fst` tool -- Weir-Cockerham Fst. Fallback: `popgen_calculator.py --type fst`
- `PopGen_inbreeding` tool -- Inbreeding coefficient from pedigree. Fallback: `popgen_calculator.py --type inbreeding`
- `PopGen_haplotype_count` tool -- Expected haplotype diversity. Fallback: `popgen_calculator.py --type haplotypes`

**Fallback script** modes (all require `--type`):
- `hwe`: `--AA N --Aa N --aa N` -- chi-square HWE test with p-value
- `fst`: `--p1 F --p2 F --n1 N --n2 N` -- Weir-Cockerham Fst
- `inbreeding`: `--pedigree TYPE --generations G` -- F from pedigree (self, full-sib, half-sib, first-cousin, etc.)
- `haplotypes`: `--snps N --generations G --recomb_rate R` -- expected haplotype diversity

---

## Key Concepts

- **MAF**: Minor allele frequency. Common: >5%. Rare: <1%. Ultra-rare: <0.01%.
- **pLI**: P(LoF intolerant). >0.9 = haploinsufficient gene.
- **LOEUF**: LoF o/e upper fraction. <0.35 = highly constrained.
- **CADD PHRED**: >=10 top 10%, >=20 top 1%, >=30 top 0.1% most deleterious.
- **Genome-wide significance**: GWAS p < 5e-8 (Bonferroni for ~1M independent tests).
- **Effect size**: OR > 1 = risk allele, < 1 = protective. Beta > 0 = increases trait.

## Evidence Grading

- **T1**: ClinVar pathogenic/likely pathogenic, FDA pharmacogenomics
- **T2**: gnomAD frequencies, GTEx eQTLs, GWAS genome-wide significant
- **T3**: CADD/SIFT/PolyPhen predictions, RegulomeDB, constraint metrics
- **T4**: VEP consequence terms, dbSNP annotations, literature mentions

Signals

Avg rating⭐ 0.0

Reviews0

Favorites0

Information

Repository: mims-harvard/ToolUniverse
Author: mims-harvard
Last Sync: 5/10/2026
Repo Updated: 5/10/2026
Created: 3/25/2026

Reviews (0)

No reviews yet. Be the first to review this skill!

Related Skills

cursorrules

CrewAI Development Rules

⭐ 43932Has guide

fastmcp-client-cli

Query and invoke tools on MCP servers using fastmcp list and fastmcp call. Use when you need to discover what tools a server offers, call tools, or integrate MCP servers into workflows.

⭐ 25095

open-source

Documentation reference for writing Python code using the browser-use open-source library. Use this skill whenever the user needs help with Agent, Browser, or Tools configuration, is writing code that imports from browser_use, asks about @sandbox deployment, supported LLM models, Actor API, custom tools, lifecycle hooks, MCP server setup, or monitoring/observability with Laminar or OpenLIT. Also trigger for questions about browser-use installation, prompting strategies, or sensitive data handling. Do NOT use this for Cloud API/SDK usage or pricing — use the cloud skill instead. Do NOT use this for directly automating a browser via CLI commands — use the browser-use skill instead.

⭐ 23311

cloud

Documentation reference for using Browser Use Cloud — the hosted API and SDK for browser automation. Use this skill whenever the user needs help with the Cloud REST API (v2 or v3), browser-use-sdk (Python or TypeScript), X-Browser-Use-API-Key authentication, cloud sessions, browser profiles, profile sync, CDP WebSocket connections, stealth browsers, residential proxies, CAPTCHA handling, webhooks, workspaces, skills marketplace, liveUrl streaming, pricing, or integration patterns (chat UI, subagent, adding browser tools to existing agents). Also trigger for questions about n8n/Make/Zapier integration, Playwright/ Puppeteer/Selenium on cloud infrastructure, or 1Password vault integration. Do NOT use this for the open-source Python library (Agent, Browser, Tools config) — use the open-source skill instead.