Data & AI

ai-llm-development - Claude MCP Skill

MANDATORY invocation for ALL LLM-related work. Invoke immediately when: - ANY mention of model names, IDs, or versions - ANY configuration of AI providers or APIs - ANY defaults/constants for LLM settings - ANY prompt engineering or modification - ANY discussion of model capabilities or features - ANY changes to AI-related dependencies - Reading/writing .env files with AI config - Modifying aiProviders.ts, prompts.ts, or similar - Reviewing AI-related pull requests - Debugging LLM integration issues CRITICAL: Training data lags reality by months. ALWAYS research first. Use WebSearch, Exa MCP, or Gemini CLI before making ANY LLM decisions.

SEO Guide: Enhance your AI agent with the ai-llm-development tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to mandatory invocation for all llm-related work. invoke immediately when: - any mention of model names... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟1 stars • 1 forks

📥0 downloads

React Next.js TypeScript API Testing CI/CD

View on GitHub🔗 Claude Servers

Documentation

SKILL.md

# AI/LLM Development

## Core Philosophy

**Context Engineering > Prompt Engineering**: Optimize entire LLM configuration, not just wording.

**Simplicity First**: 80% of use cases need single LLM call, not multi-agent systems.

**Currency Over Memory**: Models deprecate in 6-12 months. Learn to find current ones via leaderboards.

**Empiricism**: Benchmarks guide; YOUR data decides. Test top 3-5 models with your prompts.

## RESEARCH FIRST PROTOCOL

**CRITICAL**: Your training data is ALWAYS stale for LLM work. The field changes weekly.

### Before ANY LLM-Related Action

1. **Identify what you're assuming**: Model capabilities? API syntax? Best practices?
2. **Research using live tools** (in order of preference):
   - WebSearch: "latest [model/provider] models"
   - Exa MCP: Get current documentation and examples
   - Gemini CLI: Verify against latest information with web grounding
3. **Verify your assumptions**: Don't trust training data for:
   - Model names and versions (new models release monthly)
   - API syntax and parameters (providers update frequently)
   - Best practices and recommendations (evolve constantly)
   - Pricing and limits (change without notice)
   - Deprecation status (models removed regularly)

### Research Query Templates

**Model Selection**:
- "latest [provider] models"
- "[model-name] release date and capabilities"
- "is [model-name] deprecated or superseded"
- "[provider] newest models announced"

**API Syntax**:
- "[provider] API documentation [specific-feature]"
- "[sdk-name] current version and usage"
- "OpenRouter model ID format current"

**Best Practices**:
- "[task] LLM best practices latest"
- "current recommendations for [architecture pattern]"
- "[framework] latest patterns and examples"

### Red Flags That Trigger Mandatory Research

❌ Making assumptions about version numbers (3.0 vs 2.5 doesn't mean newer)
❌ Changing model defaults without verification
❌ Assuming API syntax from training data
❌ Selecting models based on memory of capabilities
❌ Following "best practices" without checking if still current
❌ Any action based on "I think..." or "probably..." for LLM topics

### Research Before Action Checklist

Before committing any LLM-related change:

- [ ] Searched for latest information on involved models/APIs
- [ ] Verified current state vs. training data assumptions
- [ ] Checked provider documentation for API syntax
- [ ] Confirmed model is not deprecated or superseded
- [ ] Validated best practices are still current
- [ ] Tested configuration syntax in provider console/playground

**Mantra**: "When in doubt about LLM tech, RESEARCH. When certain about LLM tech, STILL RESEARCH."

## Decision Trees

### Model Selection
```
Task type → Find relevant benchmark → Check leaderboards → Test top 3 empirically
Coding: SWE-bench | Reasoning: GPQA | General: Arena Elo
```
See: `references/model-selection.md`

### Architecture Complexity
```
1. Single LLM Call (start here - 80% stop here)
2. Sequential Calls (workflows)
3. LLM + Tools (function calling)
4. Agentic System (LLM controls flow)
5. Multi-Agent (only if truly needed)
```
See: `references/architecture-patterns.md`

### Vector Storage
```
<1M vectors → Postgres pgvector or Convex
1-50M vectors → Postgres with pgvectorscale
>50M + <10ms p99 → Dedicated (Qdrant, Weaviate)
```

## Key Optimizations

- **Prompt Caching**: 60-90% cost reduction. Static content first.
- **Structured Outputs**: Native JSON Schema. Zero parsing failures.
- **Model Routing**: Simple→cheap model, Complex→expensive model.
- **Hybrid RAG**: Vector + keyword search = 15-25% better than pure vector.

See: `references/prompt-engineering.md`, `references/production-checklist.md`

## Stack Defaults (TypeScript/Next.js)

- **SDK**: Vercel AI SDK (streaming, React hooks, provider-agnostic)
- **Provider**: OpenRouter (400+ models, easy A/B testing, fallbacks)
- **Vectors**: Postgres pgvector (95% of use cases, $20-50/month)
- **Observability**: Langfuse (self-hostable, generous free tier)
- **Evaluation**: Promptfoo (CI/CD integration, security testing)

## Quality Infrastructure

**Production-grade LLM apps need:**

1. **Model Gateway** (OpenRouter, LiteLLM)
   - Multi-provider access
   - Fallback chains
   - Cost routing
   - See: `llm-gateway-routing` skill

2. **Evaluation & Testing** (Promptfoo)
   - Regression testing in CI/CD
   - Security scanning (red team)
   - Quality gates
   - See: `llm-evaluation` skill

3. **Production Observability** (Langfuse)
   - Full trace debugging
   - Cost tracking
   - Latency monitoring
   - See: `langfuse-observability` skill

4. **Quality Audit**
   - Run `/llm-gates` command to audit your LLM infrastructure
   - Identifies gaps in routing, testing, observability, security, cost

### Quick Setup

```bash
# Evaluation (Promptfoo)
npx promptfoo@latest init
npx promptfoo@latest eval

# Observability (Langfuse)
pnpm add langfuse
# Sign up at langfuse.com, add keys to .env

# Gateway (OpenRouter)
# Sign up at openrouter.ai, add OPENROUTER_API_KEY to .env
```

### Quality Gate Standards

| Stage | Checks | Time Budget |
|-------|--------|-------------|
| Pre-commit | Prompt validation, secrets scan | < 5s |
| Pre-push | Regression suite, cost estimate | < 15s |
| CI/CD | Full eval, security scan, A/B comparison | < 5 min |
| Production | Traces, cost alerts, error monitoring | Continuous |

## Scripts

- `scripts/validate_llm_config.py <dir>` - Scan for LLM anti-patterns

## References

- `references/model-selection.md` - Leaderboards, search strategies, red flags
- `references/prompt-engineering.md` - Caching, structured outputs, CoT, model-specific styles
- `references/architecture-patterns.md` - Complexity ladder, RAG, tool use, caching
- `references/production-checklist.md` - Cost, errors, security, observability, evaluation

## Related Skills

- `llm-evaluation` - Promptfoo setup, CI/CD integration, security testing
- `llm-gateway-routing` - OpenRouter, LiteLLM, routing strategies
- `langfuse-observability` - Tracing, cost tracking, production debugging

## Related Commands

- `/llm-gates` - Audit LLM infrastructure quality across 5 pillars
- `/observe` - General observability audit (includes LLM section)

## Live Research Tools

**Use these BEFORE relying on training data:**

- **WebSearch**: Latest model releases, deprecations, best practices
- **Exa MCP** (`mcp__exa__web_search_exa`): Current documentation and code examples
- **Gemini CLI** (`gemini`): Sophisticated reasoning with Google Search grounding
- **Provider Playgrounds**: OpenRouter, Google AI Studio, Anthropic Console

**Research Flow**:
1. WebSearch for latest information
2. Exa MCP for documentation and examples
3. Gemini CLI for complex verification and comparison
4. Provider playground for syntax testing

Signals

Avg rating⭐ 0.0

Reviews0

Favorites0

Information

Repository: phrazzld/claude-config
Author: phrazzld
Last Sync: 1/22/2026
Repo Updated: 1/17/2026
Created: 1/13/2026

Reviews (0)

No reviews yet. Be the first to review this skill!

Related Skills

pr-status

PR Status

⭐ 46448

next-compile

Check Next.js compilation errors via a running dev server. Turbopack only. MANDATORY after every code edit before reporting work complete. Replaces `next build`.

⭐ 46448