DevOps & Infra

langfuse-observability - Claude MCP Skill

Query Langfuse traces, prompts, and LLM metrics. Use when: - Analyzing LLM generation traces (errors, latency, tokens) - Reviewing prompt performance and versions - Debugging failed generations - Comparing model outputs across runs Keywords: langfuse, traces, observability, LLM metrics, prompt management, generations

SEO Guide: Enhance your AI agent with the langfuse-observability tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to query langfuse traces, prompts, and llm metrics. use when: - analyzing llm generation traces (errors... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟1 stars • 1 forks

📥0 downloads

TypeScript REST Testing

View on GitHub🔗 Claude Servers

Documentation

SKILL.md

# Langfuse Observability

Query traces, prompts, and metrics from Langfuse. Requires env vars:
- `LANGFUSE_SECRET_KEY`
- `LANGFUSE_PUBLIC_KEY`
- `LANGFUSE_HOST` (e.g., `https://us.cloud.langfuse.com`)

## Quick Start

All commands run from the skill directory:
```bash
cd ~/.claude/skills/langfuse-observability
```

### List Recent Traces
```bash
# Last 10 traces
npx tsx scripts/fetch-traces.ts --limit 10

# Filter by name pattern
npx tsx scripts/fetch-traces.ts --name "quiz-generation" --limit 5

# Filter by user
npx tsx scripts/fetch-traces.ts --user-id "user_abc123" --limit 10
```

### Get Single Trace Details
```bash
# Full trace with spans and generations
npx tsx scripts/fetch-trace.ts <trace-id>
```

### Get Prompt
```bash
# Fetch specific prompt
npx tsx scripts/list-prompts.ts --name scry-intent-extraction

# With label
npx tsx scripts/list-prompts.ts --name scry-intent-extraction --label production
```

### Get Metrics Summary
```bash
# Summary for recent traces
npx tsx scripts/get-metrics.ts --limit 50

# Filter by trace name
npx tsx scripts/get-metrics.ts --name "quiz-generation" --limit 100
```

## Output Formats

All scripts output JSON to stdout for easy parsing.

### Trace List Output
```json
[
  {
    "id": "trace-abc123",
    "name": "quiz-generation",
    "userId": "user_xyz",
    "input": {"prompt": "..."},
    "output": {"concepts": [...]},
    "latencyMs": 3200,
    "createdAt": "2025-12-09T..."
  }
]
```

### Single Trace Output
Includes full nested structure: trace → observations (spans + generations) with token usage.

### Metrics Output
```json
{
  "totalTraces": 50,
  "successCount": 48,
  "errorCount": 2,
  "avgLatencyMs": 2850,
  "totalTokens": 125000,
  "byName": {"quiz-generation": 30, "phrasing-generation": 20}
}
```

## Common Workflows

### Debug Failed Generation
```bash
cd ~/.claude/skills/langfuse-observability

# 1. Find recent traces
npx tsx scripts/fetch-traces.ts --limit 10

# 2. Get details of specific trace
npx tsx scripts/fetch-trace.ts <trace-id>
```

### Monitor Token Usage
```bash
# Get metrics for cost analysis
npx tsx scripts/get-metrics.ts --limit 100
```

### Check Prompt Configuration
```bash
npx tsx scripts/list-prompts.ts --name scry-concept-synthesis --label production
```

## Cost Tracking

### Calculate Costs

```typescript
// Get metrics with cost calculation
const metrics = await langfuse.getMetrics({ limit: 100 });

// Pricing per 1M tokens (update as needed)
const pricing = {
  "claude-3-5-sonnet": { input: 3.0, output: 15.0 },
  "gpt-4o": { input: 2.5, output: 10.0 },
  "gpt-4o-mini": { input: 0.15, output: 0.6 },
};

function calculateCost(model: string, inputTokens: number, outputTokens: number) {
  const p = pricing[model] || { input: 1, output: 1 };
  return (inputTokens * p.input + outputTokens * p.output) / 1_000_000;
}
```

### Daily/Monthly Spend

```bash
# Get traces for date range
npx tsx scripts/fetch-traces.ts --from "2025-12-01" --to "2025-12-07" --limit 1000

# Calculate spend (parse output and sum costs)
```

### Cost Alerts

**Set up alerts in Langfuse dashboard:**
1. Go to Dashboard → Alerts
2. Create alert for: `daily_cost > X` or `cost_per_trace > Y`
3. Configure notification (email, Slack webhook)

**Or implement in code:**
```typescript
async function checkCostBudget() {
  const dailyMetrics = await langfuse.getMetrics({ since: "24h" });
  const dailyCost = calculateTotalCost(dailyMetrics);

  if (dailyCost > DAILY_BUDGET) {
    await notifySlack(`⚠️ LLM daily spend ($${dailyCost}) exceeded budget ($${DAILY_BUDGET})`);
  }
}
```

## Production Best Practices

### 1. Trace Everything

```typescript
import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: process.env.LANGFUSE_PUBLIC_KEY,
  secretKey: process.env.LANGFUSE_SECRET_KEY,
});

// Wrap every LLM call
async function tracedLLMCall(name: string, messages: Message[]) {
  const trace = langfuse.trace({
    name,
    userId: currentUser.id,
    metadata: { environment: process.env.NODE_ENV },
  });

  const generation = trace.generation({
    name: "chat",
    model: selectedModel,
    input: messages,
  });

  try {
    const response = await llm.chat({ model: selectedModel, messages });

    generation.end({
      output: response.choices[0].message,
      usage: {
        promptTokens: response.usage.prompt_tokens,
        completionTokens: response.usage.completion_tokens,
      },
    });

    return response;
  } catch (error) {
    generation.end({ level: "ERROR", statusMessage: error.message });
    throw error;
  }
}
```

### 2. Add Context

```typescript
// Include useful metadata for debugging
const trace = langfuse.trace({
  name: "user-query",
  userId: user.id,
  sessionId: session.id,  // Group related traces
  metadata: {
    userPlan: user.plan,
    feature: "chat",
    version: "v2.1",
  },
  tags: ["production", "chat-feature"],
});
```

### 3. Score Outputs

```typescript
// Track quality metrics
generation.score({
  name: "user-feedback",
  value: userRating, // 1-5
});

// Or automated scoring
generation.score({
  name: "response-length",
  value: response.content.length < 500 ? 1 : 0,
});
```

### 4. Flush Before Exit

```typescript
// Important for serverless environments
await langfuse.flushAsync();
```

## Promptfoo Integration

### Trace → Eval Case Workflow

1. **Find interesting traces in Langfuse** (failures, edge cases)
2. **Export as test cases** for Promptfoo
3. **Add to regression suite** to prevent future issues

```typescript
// Export failed traces as test cases
const failedTraces = await langfuse.getTraces({ level: "ERROR", limit: 50 });

const testCases = failedTraces.map(trace => ({
  vars: trace.input,
  assert: [
    { type: "not-contains", value: "error" },
    { type: "llm-rubric", value: "Response should address the user's question" },
  ],
}));

// Add to promptfooconfig.yaml
```

### Langfuse Callback in Promptfoo

```yaml
# promptfooconfig.yaml
defaultTest:
  options:
    callback: langfuse
    callbackConfig:
      publicKey: ${LANGFUSE_PUBLIC_KEY}
      secretKey: ${LANGFUSE_SECRET_KEY}
```

## Alternatives Comparison

| Feature | Langfuse | Helicone | LangSmith |
|---------|----------|----------|-----------|
| Open Source | ✅ | ✅ | ❌ |
| Self-Host | ✅ | ✅ | ❌ |
| Free Tier | ✅ Generous | ✅ 10K/mo | ⚠️ Limited |
| Prompt Mgmt | ✅ | ❌ | ✅ |
| Tracing | ✅ | ✅ | ✅ |
| Cost Track | ✅ | ✅ | ✅ |
| A/B Testing | ⚠️ | ❌ | ✅ |

**Choose Langfuse when**: Self-hosting needed, cost-conscious, want prompt management.

**Choose Helicone when**: Proxy-based setup preferred, simple integration.

**Choose LangSmith when**: LangChain ecosystem, enterprise support needed.

## Related Skills

- `llm-evaluation` - Promptfoo for testing, pairs well with Langfuse for observability
- `llm-gateway-routing` - OpenRouter/LiteLLM for model routing
- `ai-llm-development` - Overall LLM development patterns

## Related Commands

- `/llm-gates` - Audit LLM infrastructure including observability gaps
- `/observe` - General observability audit

Signals

Avg rating⭐ 0.0

Reviews0

Favorites0

Information

Repository: phrazzld/claude-config
Author: phrazzld
Last Sync: 3/2/2026
Repo Updated: 3/1/2026
Created: 1/13/2026

Reviews (0)

No reviews yet. Be the first to review this skill!

Related Skills

upgrade-nodejs

Upgrading Bun's Self-Reported Node.js Version

⭐ 44920Has guide

cursorrules

CrewAI Development Rules

⭐ 43932Has guide

CLAUDE

CLAUDE.md

⭐ 27972

cloud

Documentation reference for using Browser Use Cloud — the hosted API and SDK for browser automation. Use this skill whenever the user needs help with the Cloud REST API (v2 or v3), browser-use-sdk (Python or TypeScript), X-Browser-Use-API-Key authentication, cloud sessions, browser profiles, profile sync, CDP WebSocket connections, stealth browsers, residential proxies, CAPTCHA handling, webhooks, workspaces, skills marketplace, liveUrl streaming, pricing, or integration patterns (chat UI, subagent, adding browser tools to existing agents). Also trigger for questions about n8n/Make/Zapier integration, Playwright/ Puppeteer/Selenium on cloud infrastructure, or 1Password vault integration. Do NOT use this for the open-source Python library (Agent, Browser, Tools config) — use the open-source skill instead.

⭐ 23311