Data & AI
level-up - Claude MCP Skill
Take your AI agent to the next level with full LangWatch integration. Adds tracing, prompt versioning, evaluation experiments, and simulation tests in one go. Use when the user wants comprehensive observability, testing, and prompt management for their agent.
SEO Guide: Enhance your AI agent with the level-up tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to take your ai agent to the next level with full langwatch integration. adds tracing, prompt versionin... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# Take Your Agent to the Next Level
This skill sets up your agent with the full LangWatch stack: tracing, prompt versioning, evaluation experiments, and agent simulation tests. Each step builds on the previous one.
## Plan Limits
See [Plan Limits](_shared/plan-limits.md).
## Prerequisites
See [CLI Setup](_shared/cli-setup.md).
## Consultant Mode
After completing all steps, don't just stop — summarize everything you set up and suggest 2-3 ways to go deeper based on what you learned about the codebase. Detailed guidance:
See [Consultant Mode](_shared/consultant-mode.md).
## Step 1: Add Tracing
Add LangWatch tracing to capture all LLM calls, costs, and latency.
1. Read the integration guide for this project's framework:
```bash
langwatch docs # Browse the index to find the right page
langwatch docs integration/python/guide # Python (or pick your framework)
langwatch docs integration/typescript/guide # TypeScript (or pick your framework)
```
2. Install the LangWatch SDK (`pip install langwatch` or `npm install langwatch`)
3. Add instrumentation following the framework-specific guide
4. Add `LANGWATCH_API_KEY` to `.env`
**Verify**: Run the application briefly and confirm traces appear:
```bash
langwatch trace search --limit 5
```
## Step 2: Version Your Prompts
Move hardcoded prompts to LangWatch Prompts CLI for version control and collaboration.
1. Read the Prompts CLI docs:
```bash
langwatch docs prompt-management/cli
```
2. Initialize: `langwatch prompt init`
3. Create prompts: `langwatch prompt create <name>` for each prompt in the code
4. Update application code to use `langwatch.prompts.get("name")` instead of hardcoded strings
5. Sync: `langwatch prompt sync`
**Verify**: `langwatch prompt list` (or check the Prompts section at https://app.langwatch.ai).
Do NOT hardcode prompts in code. Do NOT add try/catch fallbacks around `prompts.get()`.
## Step 3: Create an Evaluation Experiment
Build a batch evaluation to measure your agent's quality across many examples.
1. Read the experiments SDK docs:
```bash
langwatch docs evaluations/experiments/sdk
```
2. Analyze the agent's code to understand what it does
3. Generate a dataset of 10-20 examples tailored to the agent's domain (NOT generic examples)
4. Create an experiment file:
- Python: Jupyter notebook with `langwatch.experiment.init()`, evaluation loop, and evaluators
- TypeScript: Script with `langwatch.experiments.init()` and `evaluation.run()`
5. Include at least one evaluator (LLM-as-judge for quality is a good default)
**Verify**: Run the experiment (`jupyter nbconvert --to notebook --execute experiment.ipynb` or `npx tsx experiment.ts`) and check results appear in the LangWatch Experiments view.
## Step 4: Add Agent Simulation Tests
Create scenario tests to validate agent behavior in realistic multi-turn conversations.
1. Read the Scenario docs:
```bash
langwatch scenario-docs # Browse the index
langwatch scenario-docs getting-started # Getting Started guide
langwatch scenario-docs agent-integration
```
2. Install the Scenario SDK (`pip install langwatch-scenario` or `npm install @langwatch/scenario`)
3. Write scenario tests with `AgentAdapter`, `UserSimulatorAgent`, and `JudgeAgent`
4. Use semantic criteria in JudgeAgent (NOT regex matching)
**Verify**: Run the tests (`pytest -s` or `npx vitest run`) and confirm they pass.
NEVER invent your own testing framework. Use `@langwatch/scenario` / `langwatch-scenario`.
## Common Mistakes
- Do NOT skip any step -- each builds on the previous
- Do NOT use generic datasets in the experiment -- tailor them to the agent's domain
- Do NOT hardcode prompts -- use the Prompts CLI
- Do NOT invent testing frameworks -- use Scenario
- Do NOT skip verification steps -- run the application/experiment/tests after each step
- Always read docs via `langwatch docs ...` / `langwatch scenario-docs ...` before writing code; do not work from memory of past framework versionsSignals
Information
- Repository
- langwatch/langwatch
- Author
- langwatch
- Last Sync
- 4/24/2026
- Repo Updated
- 4/23/2026
- Created
- 3/17/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.