General

Protein Interaction Network Analysis - Claude MCP Skill

Analyze protein-protein interaction networks using STRING, BioGRID, and SASBDB databases. Maps protein identifiers, retrieves interaction networks with confidence scores, performs functional enrichment analysis (GO/KEGG/Reactome), and optionally includes structural data. No API key required for core functionality (STRING). Use when analyzing protein networks, discovering interaction partners, identifying functional modules, or studying protein complexes.

SEO Guide: Enhance your AI agent with the Protein Interaction Network Analysis tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to analyze protein-protein interaction networks using string, biogrid, and sasbdb databases. maps prote... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟16 stars β€’ 173 forks
πŸ“₯0 downloads

Documentation

SKILL.md
# Protein Interaction Network Analysis

Comprehensive protein interaction network analysis using ToolUniverse tools. Analyzes protein networks through a 4-phase workflow: identifier mapping, network retrieval, enrichment analysis, and optional structural data.

## Features

βœ… **Identifier Mapping** - Convert protein names to database IDs (STRING, UniProt, Ensembl)
βœ… **Network Retrieval** - Get interaction networks with confidence scores (0-1.0)
βœ… **Functional Enrichment** - GO terms, KEGG pathways, Reactome pathways
βœ… **PPI Enrichment** - Test if proteins form functional modules
βœ… **Structural Data** - Optional SAXS/SANS solution structures (SASBDB)
βœ… **Fallback Strategy** - STRING primary (no API key) β†’ BioGRID secondary (if key available)

## Databases Used

| Database | Coverage | API Key | Purpose |
|----------|----------|---------|---------|
| **STRING** | 14M+ proteins, 5,000+ organisms | ❌ Not required | Primary interaction source |
| **BioGRID** | 2.3M+ interactions, 80+ organisms | βœ… Required | Fallback, curated data |
| **SASBDB** | 2,000+ SAXS/SANS entries | ❌ Not required | Solution structures |

## Quick Start

### Basic Usage

```python
from tooluniverse import ToolUniverse
from python_implementation import analyze_protein_network

# Initialize ToolUniverse
tu = ToolUniverse()

# Analyze protein network
result = analyze_protein_network(
    tu=tu,
    proteins=["TP53", "MDM2", "ATM", "CHEK2"],
    species=9606,  # Human
    confidence_score=0.7  # High confidence
)

# Access results
print(f"Mapped: {len(result.mapped_proteins)} proteins")
print(f"Network: {result.total_interactions} interactions")
print(f"Enrichment: {len(result.enriched_terms)} GO terms")
print(f"PPI p-value: {result.ppi_enrichment.get('p_value', 1.0):.2e}")
```

### Expected Output

```
πŸ” Phase 1: Mapping 4 protein identifiers...
βœ… Mapped 4/4 proteins (100.0%)

πŸ•ΈοΈ  Phase 2: Retrieving interaction network...
βœ… STRING: Retrieved 6 interactions

🧬 Phase 3: Performing enrichment analysis...
βœ… Found 245 enriched GO terms (FDR < 0.05)
βœ… PPI enrichment significant (p=3.45e-05)

βœ… Analysis complete!
```

## Use Cases

### 1. Single Protein Analysis

Discover interaction partners for a protein of interest:

```python
result = analyze_protein_network(
    tu=tu,
    proteins=["TP53"],  # Single protein
    species=9606,
    confidence_score=0.7
)

# Top 5 partners will be in the network
for edge in result.network_edges[:5]:
    print(f"{edge['preferredName_A']} ↔ {edge['preferredName_B']} "
          f"(score: {edge['score']})")
```

### 2. Protein Complex Validation

Test if proteins form a functional complex:

```python
# DNA damage response proteins
proteins = ["TP53", "ATM", "CHEK2", "BRCA1", "BRCA2"]

result = analyze_protein_network(tu=tu, proteins=proteins)

# Check PPI enrichment
if result.ppi_enrichment.get("p_value", 1.0) < 0.05:
    print("βœ… Proteins form functional module!")
    print(f"   Expected edges: {result.ppi_enrichment['expected_number_of_edges']:.1f}")
    print(f"   Observed edges: {result.ppi_enrichment['number_of_edges']}")
else:
    print("⚠️  Proteins may be unrelated")
```

### 3. Pathway Discovery

Find enriched pathways for a protein set:

```python
result = analyze_protein_network(
    tu=tu,
    proteins=["MAPK1", "MAPK3", "RAF1", "MAP2K1"],  # MAPK pathway
    confidence_score=0.7
)

# Show top enriched processes
print("\nTop Enriched Pathways:")
for term in result.enriched_terms[:10]:
    print(f"  {term['term']}: p={term['p_value']:.2e}, FDR={term['fdr']:.2e}")
```

### 4. Multi-Protein Network Analysis

Build complete interaction network for multiple proteins:

```python
# Apoptosis regulators
proteins = ["TP53", "BCL2", "BAX", "CASP3", "CASP9"]

result = analyze_protein_network(
    tu=tu,
    proteins=proteins,
    confidence_score=0.7
)

# Export network for Cytoscape
import pandas as pd
df = pd.DataFrame(result.network_edges)
df.to_csv("apoptosis_network.tsv", sep="\t", index=False)
```

### 5. With BioGRID Validation

Use BioGRID for experimentally validated interactions:

```python
# Requires BIOGRID_API_KEY in environment
result = analyze_protein_network(
    tu=tu,
    proteins=["TP53", "MDM2"],
    include_biogrid=True  # Enable BioGRID fallback
)

print(f"Primary source: {result.primary_source}")  # "STRING" or "BioGRID"
```

### 6. Including Structural Data

Add SAXS/SANS solution structures:

```python
result = analyze_protein_network(
    tu=tu,
    proteins=["TP53"],
    include_structure=True  # Query SASBDB
)

if result.structural_data:
    print(f"\nFound {len(result.structural_data)} SAXS/SANS entries:")
    for entry in result.structural_data:
        print(f"  {entry.get('sasbdb_id')}: {entry.get('title')}")
```

## Parameters

### `analyze_protein_network()` Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `tu` | ToolUniverse | Required | ToolUniverse instance |
| `proteins` | list[str] | Required | Protein identifiers (gene symbols, UniProt IDs) |
| `species` | int | 9606 | NCBI taxonomy ID (9606=human, 10090=mouse) |
| `confidence_score` | float | 0.7 | Min interaction confidence (0-1). 0.4=low, 0.7=high, 0.9=very high |
| `include_biogrid` | bool | False | Use BioGRID if STRING fails (requires API key) |
| `include_structure` | bool | False | Include SASBDB structural data (slower) |
| `suppress_warnings` | bool | True | Suppress ToolUniverse loading warnings |

### Species IDs (Common)

- `9606` - Homo sapiens (human)
- `10090` - Mus musculus (mouse)
- `10116` - Rattus norvegicus (rat)
- `7227` - Drosophila melanogaster (fruit fly)
- `6239` - Caenorhabditis elegans (worm)
- `7955` - Danio rerio (zebrafish)
- `559292` - Saccharomyces cerevisiae (yeast)

### Confidence Score Guidelines

| Score | Level | Description | Use Case |
|-------|-------|-------------|----------|
| 0.15 | Very low | All evidence | Exploratory, hypothesis generation |
| 0.4 | Low | Medium evidence | Default STRING threshold |
| 0.7 | High | Strong evidence | **Recommended** - reliable interactions |
| 0.9 | Very high | Strongest evidence | Core interactions only |

## Results Structure

### `ProteinNetworkResult` Object

```python
@dataclass
class ProteinNetworkResult:
    # Phase 1: Identifier mapping
    mapped_proteins: List[Dict[str, Any]]
    mapping_success_rate: float

    # Phase 2: Network retrieval
    network_edges: List[Dict[str, Any]]
    total_interactions: int

    # Phase 3: Enrichment analysis
    enriched_terms: List[Dict[str, Any]]
    ppi_enrichment: Dict[str, Any]

    # Phase 4: Structural data (optional)
    structural_data: Optional[List[Dict[str, Any]]]

    # Metadata
    primary_source: str  # "STRING" or "BioGRID"
    warnings: List[str]
```

### Network Edge Format (STRING)

```python
{
    "stringId_A": "9606.ENSP00000269305",  # Protein A STRING ID
    "stringId_B": "9606.ENSP00000258149",  # Protein B STRING ID
    "preferredName_A": "TP53",             # Protein A name
    "preferredName_B": "MDM2",             # Protein B name
    "ncbiTaxonId": 9606,                   # Species
    "score": 0.999,                        # Combined confidence (0-1)
    "nscore": 0.0,                         # Neighborhood score
    "fscore": 0.0,                         # Gene fusion score
    "pscore": 0.0,                         # Phylogenetic profile score
    "ascore": 0.947,                       # Coexpression score
    "escore": 0.951,                       # Experimental score
    "dscore": 0.9,                         # Database score
    "tscore": 0.994                        # Text mining score
}
```

### Enrichment Term Format

```python
{
    "category": "Process",                  # GO category
    "term": "GO:0006915",                   # GO term ID
    "description": "apoptotic process",     # Term description
    "number_of_genes": 4,                   # Genes in your set
    "number_of_genes_in_background": 1234, # Genes in genome
    "p_value": 1.23e-05,                    # Enrichment p-value
    "fdr": 0.0012,                          # FDR correction
    "inputGenes": "TP53,MDM2,BAX,CASP3"    # Matching genes
}
```

## Workflow Details

### 4-Phase Analysis Pipeline

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase 1: Identifier Mapping                                 β”‚
β”‚ ─────────────────────────────────────────────────────────── β”‚
β”‚ STRING_map_identifiers()                                    β”‚
β”‚   β€’ Validates protein names exist in database              β”‚
β”‚   β€’ Converts to STRING IDs for consistency                 β”‚
β”‚   β€’ Returns mapping success rate                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase 2: Network Retrieval                                  β”‚
β”‚ ─────────────────────────────────────────────────────────── β”‚
β”‚ PRIMARY: STRING_get_network() (no API key needed)          β”‚
β”‚   β€’ Retrieves all pairwise interactions                    β”‚
β”‚   β€’ Returns confidence scores by evidence type             β”‚
β”‚                                                             β”‚
β”‚ FALLBACK: BioGRID_get_interactions() (if enabled)          β”‚
β”‚   β€’ Used if STRING fails or for validation                 β”‚
β”‚   β€’ Requires BIOGRID_API_KEY                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase 3: Enrichment Analysis                                β”‚
β”‚ ─────────────────────────────────────────────────────────── β”‚
β”‚ STRING_functional_enrichment()                              β”‚
β”‚   β€’ GO terms (Process, Component, Function)                β”‚
β”‚   β€’ KEGG pathways                                           β”‚
β”‚   β€’ Reactome pathways                                       β”‚
β”‚   β€’ FDR-corrected p-values                                  β”‚
β”‚                                                             β”‚
β”‚ STRING_ppi_enrichment()                                     β”‚
β”‚   β€’ Tests if proteins interact more than random            β”‚
β”‚   β€’ Returns p-value for functional coherence               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase 4: Structural Data (Optional)                         β”‚
β”‚ ─────────────────────────────────────────────────────────── β”‚
β”‚ SASBDB_search_entries()                                     β”‚
β”‚   β€’ SAXS/SANS solution structures                           β”‚
β”‚   β€’ Protein flexibility and conformations                   β”‚
β”‚   β€’ Complements crystal/cryo-EM data                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Installation & Setup

### Prerequisites

```bash
# Install ToolUniverse (if not already installed)
pip install tooluniverse

# Or with extras
pip install tooluniverse[all]
```

### Optional: BioGRID API Key

For BioGRID fallback functionality:

1. Register for free API key: https://webservice.thebiogrid.org/
2. Add to `.env` file:
   ```bash
   BIOGRID_API_KEY=your_key_here
   ```

### Skill Files

```
tooluniverse-protein-interactions/
β”œβ”€β”€ SKILL.md                    # This file
β”œβ”€β”€ python_implementation.py    # Main implementation
β”œβ”€β”€ QUICK_START.md             # Quick reference
β”œβ”€β”€ DOMAIN_ANALYSIS.md         # Design rationale
└── KNOWN_ISSUES.md            # ToolUniverse limitations
```

## Known Limitations

### 1. ToolUniverse Verbose Output

**Issue**: ToolUniverse prints 40+ warning messages during analysis.

**Workaround**: Filter output when running:
```bash
python your_script.py 2>&1 | grep -v "Error loading tools"
```

See `KNOWN_ISSUES.md` for details.

### 2. BioGRID Requires API Key

BioGRID fallback requires free API key. STRING works without any API key.

### 3. SASBDB May Have API Issues

SASBDB endpoints occasionally return errors. Structural data is optional.

## Performance

### Typical Execution Times

| Operation | Time | Notes |
|-----------|------|-------|
| Identifier mapping | 1-2 sec | For 5 proteins |
| Network retrieval | 2-3 sec | Depends on network size |
| Enrichment analysis | 3-5 sec | For 374 terms |
| Full 4-phase analysis | 6-10 sec | Excluding ToolUniverse overhead |

**Note**: Add 4-8 seconds per tool call for ToolUniverse loading (framework limitation).

### Optimization Tips

1. **Disable structural data** if not needed: `include_structure=False`
2. **Use higher confidence scores** to reduce network size: `confidence_score=0.9`
3. **Filter output** to avoid processing warning messages
4. **Reuse ToolUniverse instance** across multiple analyses

## Troubleshooting

### "Error: 'protein_ids' is a required property"

βœ… **Fixed in this skill** - All parameter names verified in Phase 2 testing.

### No interactions found

- Check protein names are correct (case-sensitive)
- Try lower confidence score: `confidence_score=0.4`
- Verify species ID is correct
- Check if proteins actually interact (not all proteins have known interactions)

### BioGRID not working

- Ensure `BIOGRID_API_KEY` is set in environment
- Check API key is valid at https://webservice.thebiogrid.org/
- BioGRID is optional - STRING works without it

### Slow performance

- This is expected (see KNOWN_ISSUES.md)
- ToolUniverse framework reloads tools on every call
- Use output filtering to reduce processing time

## Examples

See `python_implementation.py` for:
- `example_tp53_analysis()` - Complete TP53 network analysis
- `analyze_protein_network()` - Main function with all options
- `ProteinNetworkResult` - Result data structure

## References

- **STRING**: https://string-db.org/ (14M+ proteins, 5,000+ organisms)
- **BioGRID**: https://thebiogrid.org/ (2.3M+ interactions, experimentally validated)
- **SASBDB**: https://www.sasbdb.org/ (2,000+ SAXS/SANS entries)
- **ToolUniverse**: https://github.com/mims-harvard/ToolUniverse

## Support

For issues with:
- **This skill**: Check KNOWN_ISSUES.md and troubleshooting section
- **ToolUniverse framework**: See TOOLUNIVERSE_BUG_REPORT.md
- **API errors**: Check database status pages (STRING, BioGRID, SASBDB)

## License

Same as ToolUniverse framework license.

Signals

Avg rating⭐ 0.0
Reviews0
Favorites0

Information

Repository
mims-harvard/ToolUniverse
Author
mims-harvard
Last Sync
3/12/2026
Repo Updated
3/12/2026
Created
2/19/2026

Reviews (0)

No reviews yet. Be the first to review this skill!