Maps & Geo
llm-gateway-routing - Claude MCP Skill
LLM gateway and routing configuration using OpenRouter and LiteLLM. Invoke when: - Setting up multi-model access (OpenRouter, LiteLLM) - Configuring model fallbacks and reliability - Implementing cost-based or latency-based routing - A/B testing different models - Self-hosting an LLM proxy Keywords: openrouter, litellm, llm gateway, model routing, fallback, A/B testing
SEO Guide: Enhance your AI agent with the llm-gateway-routing tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to llm gateway and routing configuration using openrouter and litellm. invoke when: - setting up multi-... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# LLM Gateway & Routing
Configure multi-model access, fallbacks, cost optimization, and A/B testing.
## Why Use a Gateway?
**Without gateway:**
- Vendor lock-in (one provider)
- No fallbacks (provider down = app down)
- Hard to A/B test models
- Scattered API keys and configs
**With gateway:**
- Single API for 400+ models
- Automatic fallbacks
- Easy model switching
- Unified cost tracking
## Quick Decision
| Need | Solution |
|------|----------|
| Fastest setup, multi-model | **OpenRouter** |
| Full control, self-hosted | **LiteLLM** |
| Observability + routing | **Helicone** |
| Enterprise, guardrails | **Portkey** |
## OpenRouter (Recommended)
### Why OpenRouter
- **400+ models**: OpenAI, Anthropic, Google, Meta, Mistral, and more
- **Single API**: One key for all providers
- **Automatic fallbacks**: Built-in reliability
- **A/B testing**: Easy model comparison
- **Cost tracking**: Unified billing dashboard
- **Free credits**: $1 free to start
### Setup
```bash
# 1. Sign up at openrouter.ai
# 2. Get API key from dashboard
# 3. Add to .env:
OPENROUTER_API_KEY=sk-or-v1-...
```
### Basic Usage
```typescript
// Using fetch
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3-5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});
```
### With Vercel AI SDK (Recommended)
```typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";
const openrouter = createOpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
const { text } = await generateText({
model: openrouter("anthropic/claude-3-5-sonnet"),
prompt: "Explain quantum computing",
});
```
### Model IDs
```typescript
// Format: provider/model-name
const models = {
// Anthropic
claude35Sonnet: "anthropic/claude-3-5-sonnet",
claudeHaiku: "anthropic/claude-3-5-haiku",
// OpenAI
gpt4o: "openai/gpt-4o",
gpt4oMini: "openai/gpt-4o-mini",
// Google
geminiPro: "google/gemini-pro-1.5",
geminiFlash: "google/gemini-flash-1.5",
// Meta
llama3: "meta-llama/llama-3.1-70b-instruct",
// Auto (OpenRouter picks best)
auto: "openrouter/auto",
};
```
### Fallback Chains
```typescript
// Define fallback order
const modelChain = [
"anthropic/claude-3-5-sonnet", // Primary
"openai/gpt-4o", // Fallback 1
"google/gemini-pro-1.5", // Fallback 2
];
async function callWithFallback(messages: Message[]) {
for (const model of modelChain) {
try {
return await openrouter.chat({ model, messages });
} catch (error) {
console.log(`${model} failed, trying next...`);
}
}
throw new Error("All models failed");
}
```
### Cost Routing
```typescript
// Route based on query complexity
function selectModel(query: string): string {
const complexity = analyzeComplexity(query);
if (complexity === "simple") {
// Simple queries → cheap model
return "openai/gpt-4o-mini"; // ~$0.15/1M tokens
} else if (complexity === "medium") {
// Medium → balanced
return "google/gemini-flash-1.5"; // ~$0.075/1M tokens
} else {
// Complex → best quality
return "anthropic/claude-3-5-sonnet"; // ~$3/1M tokens
}
}
function analyzeComplexity(query: string): "simple" | "medium" | "complex" {
// Simple heuristics
if (query.length < 50) return "simple";
if (query.includes("explain") || query.includes("analyze")) return "complex";
return "medium";
}
```
### A/B Testing
```typescript
// Random assignment
function getModel(userId: string): string {
const hash = userId.charCodeAt(0) % 100;
if (hash < 50) {
return "anthropic/claude-3-5-sonnet"; // 50%
} else {
return "openai/gpt-4o"; // 50%
}
}
// Track which model was used
const model = getModel(userId);
const response = await openrouter.chat({ model, messages });
await analytics.track("llm_call", { model, userId, latency, cost });
```
## LiteLLM (Self-Hosted)
### Why LiteLLM
- **Self-hosted**: Full control over data
- **100+ providers**: Same coverage as OpenRouter
- **Load balancing**: Distribute across providers
- **Cost tracking**: Built-in spend management
- **Caching**: Redis or in-memory
- **Rate limiting**: Per-user limits
### Setup
```bash
# Install
pip install litellm[proxy]
# Run proxy
litellm --config config.yaml
# Use as OpenAI-compatible endpoint
export OPENAI_API_BASE=http://localhost:4000
```
### Configuration
```yaml
# config.yaml
model_list:
# Claude models
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
api_key: sk-ant-...
# OpenAI models
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: sk-...
# Load balanced (multiple providers)
- model_name: balanced
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
litellm_params:
model: openai/gpt-4o
# Requests distributed across both
# General settings
general_settings:
master_key: sk-master-...
database_url: postgresql://...
# Routing
router_settings:
routing_strategy: simple-shuffle # or latency-based-routing
num_retries: 3
timeout: 30
# Rate limiting
litellm_settings:
max_budget: 100 # $100/month
budget_duration: monthly
```
### Fallbacks in LiteLLM
```yaml
model_list:
- model_name: primary
litellm_params:
model: anthropic/claude-3-5-sonnet-latest
fallbacks:
- model_name: fallback-1
litellm_params:
model: openai/gpt-4o
- model_name: fallback-2
litellm_params:
model: google/gemini-pro
```
### Usage
```typescript
// Use like OpenAI SDK
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:4000",
apiKey: "sk-master-...",
});
const response = await client.chat.completions.create({
model: "claude-sonnet", // Maps to configured model
messages: [{ role: "user", content: "Hello!" }],
});
```
## Routing Strategies
### 1. Cost-Based Routing
```typescript
const costTiers = {
cheap: ["openai/gpt-4o-mini", "google/gemini-flash-1.5"],
balanced: ["anthropic/claude-3-5-haiku", "openai/gpt-4o"],
premium: ["anthropic/claude-3-5-sonnet", "openai/o1-preview"],
};
function routeByCost(budget: "cheap" | "balanced" | "premium"): string {
const models = costTiers[budget];
return models[Math.floor(Math.random() * models.length)];
}
```
### 2. Latency-Based Routing
```typescript
// Track latency per model
const latencyStats: Record<string, number[]> = {};
function routeByLatency(): string {
const avgLatencies = Object.entries(latencyStats)
.map(([model, times]) => ({
model,
avg: times.reduce((a, b) => a + b, 0) / times.length,
}))
.sort((a, b) => a.avg - b.avg);
return avgLatencies[0].model;
}
// Update after each call
function recordLatency(model: string, latencyMs: number) {
if (!latencyStats[model]) latencyStats[model] = [];
latencyStats[model].push(latencyMs);
// Keep last 100 samples
if (latencyStats[model].length > 100) {
latencyStats[model].shift();
}
}
```
### 3. Task-Based Routing
```typescript
const taskModels = {
coding: "anthropic/claude-3-5-sonnet", // Best for code
reasoning: "openai/o1-preview", // Best for logic
creative: "anthropic/claude-3-5-sonnet", // Best for writing
simple: "openai/gpt-4o-mini", // Cheap and fast
multimodal: "google/gemini-pro-1.5", // Vision + text
};
function routeByTask(task: keyof typeof taskModels): string {
return taskModels[task];
}
```
### 4. Hybrid Routing
```typescript
interface RoutingConfig {
task: string;
maxCost: number;
maxLatency: number;
}
function hybridRoute(config: RoutingConfig): string {
// Filter by cost
const affordable = models.filter(m => m.cost <= config.maxCost);
// Filter by latency
const fast = affordable.filter(m => m.avgLatency <= config.maxLatency);
// Select best for task
const taskScores = fast.map(m => ({
model: m.id,
score: getTaskScore(m.id, config.task),
}));
return taskScores.sort((a, b) => b.score - a.score)[0].model;
}
```
## Best Practices
### 1. Always Have Fallbacks
```typescript
// Bad: Single point of failure
const response = await openai.chat({ model: "gpt-4o", messages });
// Good: Fallback chain
const models = ["gpt-4o", "claude-3-5-sonnet", "gemini-pro"];
for (const model of models) {
try {
return await gateway.chat({ model, messages });
} catch (e) {
continue;
}
}
```
### 2. Pin Model Versions
```typescript
// Bad: Model can change
const model = "gpt-4";
// Good: Pinned version
const model = "openai/gpt-4-0125-preview";
```
### 3. Track Costs
```typescript
// Log every call
async function trackedCall(model: string, messages: Message[]) {
const start = Date.now();
const response = await gateway.chat({ model, messages });
const latency = Date.now() - start;
await analytics.track("llm_call", {
model,
inputTokens: response.usage.prompt_tokens,
outputTokens: response.usage.completion_tokens,
cost: calculateCost(model, response.usage),
latency,
});
return response;
}
```
### 4. Set Token Limits
```typescript
// Prevent runaway costs
const response = await gateway.chat({
model,
messages,
max_tokens: 500, // Limit output length
});
```
### 5. Use Caching
```typescript
// LiteLLM caching
litellm_settings:
cache: true
cache_params:
type: redis
host: localhost
port: 6379
ttl: 3600 # 1 hour
```
## References
- `references/openrouter-guide.md` - OpenRouter deep dive
- `references/litellm-guide.md` - LiteLLM self-hosting
- `references/routing-strategies.md` - Advanced routing patterns
- `references/alternatives.md` - Helicone, Portkey, etc.
## Templates
- `templates/openrouter-config.ts` - TypeScript OpenRouter setup
- `templates/litellm-config.yaml` - LiteLLM proxy config
- `templates/fallback-chain.ts` - Fallback implementationSignals
Information
- Repository
- phrazzld/claude-config
- Author
- phrazzld
- Last Sync
- 3/2/2026
- Repo Updated
- 3/1/2026
- Created
- 1/13/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.