General
phoenix-evals - Claude MCP Skill
Build and run evaluators for AI/LLM applications using Phoenix.
SEO Guide: Enhance your AI agent with the phoenix-evals tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to build and run evaluators for ai/llm applications using phoenix.... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# Phoenix Evals
Build evaluators for AI/LLM applications. Code first, LLM for nuance, validate against humans.
## Quick Reference
| Task | Files |
| ---- | ----- |
| Setup | `setup-python`, `setup-typescript` |
| Decide what to evaluate | `evaluators-overview` |
| Choose a judge model | `fundamentals-model-selection` |
| Use pre-built evaluators | `evaluators-pre-built` |
| Build code evaluator | `evaluators-code-{python\|typescript}` |
| Build LLM evaluator | `evaluators-llm-{python\|typescript}`, `evaluators-custom-templates` |
| Batch evaluate DataFrame | `evaluate-dataframe-python` |
| Run experiment | `experiments-running-{python\|typescript}` |
| Create dataset | `experiments-datasets-{python\|typescript}` |
| Generate synthetic data | `experiments-synthetic-{python\|typescript}` |
| Validate evaluator accuracy | `validation`, `validation-evaluators-{python\|typescript}` |
| Sample traces for review | `observe-sampling-{python\|typescript}` |
| Analyze errors | `error-analysis`, `error-analysis-multi-turn`, `axial-coding` |
| RAG evals | `evaluators-rag` |
| Avoid common mistakes | `common-mistakes-python`, `fundamentals-anti-patterns` |
| Production | `production-overview`, `production-guardrails`, `production-continuous` |
## Workflows
**Starting Fresh:**
`observe-tracing-setup` → `error-analysis` → `axial-coding` → `evaluators-overview`
**Building Evaluator:**
`fundamentals` → `common-mistakes-python` → `evaluators-{code\|llm}-{python\|typescript}` → `validation-evaluators-{python\|typescript}`
**RAG Systems:**
`evaluators-rag` → `evaluators-code-*` (retrieval) → `evaluators-llm-*` (faithfulness)
**Production:**
`production-overview` → `production-guardrails` → `production-continuous`
## Rule Categories
| Prefix | Description |
| ------ | ----------- |
| `fundamentals-*` | Types, scores, anti-patterns |
| `observe-*` | Tracing, sampling |
| `error-analysis-*` | Finding failures |
| `axial-coding-*` | Categorizing failures |
| `evaluators-*` | Code, LLM, RAG evaluators |
| `experiments-*` | Datasets, running experiments |
| `validation-*` | Validating evaluator accuracy against human labels |
| `production-*` | CI/CD, monitoring |
## Key Principles
| Principle | Action |
| --------- | ------ |
| Error analysis first | Can't automate what you haven't observed |
| Custom > generic | Build from your failures |
| Code first | Deterministic before LLM |
| Validate judges | >80% TPR/TNR |
| Binary > Likert | Pass/fail, not 1-5 |Signals
Information
- Repository
- Arize-ai/phoenix
- Author
- Arize-ai
- Last Sync
- 3/11/2026
- Repo Updated
- 3/11/2026
- Created
- 1/27/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
cursorrules
CrewAI Development Rules
CLAUDE
CLAUDE.md
fastmcp-client-cli
Query and invoke tools on MCP servers using fastmcp list and fastmcp call. Use when you need to discover what tools a server offers, call tools, or integrate MCP servers into workflows.
Confidence Check
Pre-implementation confidence assessment (≥90% required). Use before starting any implementation to verify readiness with duplicate check, architecture compliance, official docs verification, OSS references, and root cause identification.
Related Guides
Python Django Best Practices: A Comprehensive Guide to the Claude Skill
Learn how to use the python django best practices Claude skill. Complete guide with installation instructions and examples.
Mastering Python and TypeScript Development with the Claude Skill Guide
Learn how to use the python typescript guide Claude skill. Complete guide with installation instructions and examples.
Mastering Data Science with Claude: A Complete Guide to the Pandas Scikit-Learn Skill
Learn how to use the pandas scikit learn guide Claude skill. Complete guide with installation instructions and examples.