Data & AI
ai-llm-development - Claude MCP Skill
MANDATORY invocation for ALL LLM-related work. Invoke immediately when: - ANY mention of model names, IDs, or versions - ANY configuration of AI providers or APIs - ANY defaults/constants for LLM settings - ANY prompt engineering or modification - ANY discussion of model capabilities or features - ANY changes to AI-related dependencies - Reading/writing .env files with AI config - Modifying aiProviders.ts, prompts.ts, or similar - Reviewing AI-related pull requests - Debugging LLM integration issues CRITICAL: Training data lags reality by months. ALWAYS research first. Use WebSearch, Exa MCP, or Gemini CLI before making ANY LLM decisions.
SEO Guide: Enhance your AI agent with the ai-llm-development tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to mandatory invocation for all llm-related work. invoke immediately when: - any mention of model names... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# AI/LLM Development ## Core Philosophy **Context Engineering > Prompt Engineering**: Optimize entire LLM configuration, not just wording. **Simplicity First**: 80% of use cases need single LLM call, not multi-agent systems. **Currency Over Memory**: Models deprecate in 6-12 months. Learn to find current ones via leaderboards. **Empiricism**: Benchmarks guide; YOUR data decides. Test top 3-5 models with your prompts. ## RESEARCH FIRST PROTOCOL **CRITICAL**: Your training data is ALWAYS stale for LLM work. The field changes weekly. ### Before ANY LLM-Related Action 1. **Identify what you're assuming**: Model capabilities? API syntax? Best practices? 2. **Research using live tools** (in order of preference): - WebSearch: "latest [model/provider] models" - Exa MCP: Get current documentation and examples - Gemini CLI: Verify against latest information with web grounding 3. **Verify your assumptions**: Don't trust training data for: - Model names and versions (new models release monthly) - API syntax and parameters (providers update frequently) - Best practices and recommendations (evolve constantly) - Pricing and limits (change without notice) - Deprecation status (models removed regularly) ### Research Query Templates **Model Selection**: - "latest [provider] models" - "[model-name] release date and capabilities" - "is [model-name] deprecated or superseded" - "[provider] newest models announced" **API Syntax**: - "[provider] API documentation [specific-feature]" - "[sdk-name] current version and usage" - "OpenRouter model ID format current" **Best Practices**: - "[task] LLM best practices latest" - "current recommendations for [architecture pattern]" - "[framework] latest patterns and examples" ### Red Flags That Trigger Mandatory Research ❌ Making assumptions about version numbers (3.0 vs 2.5 doesn't mean newer) ❌ Changing model defaults without verification ❌ Assuming API syntax from training data ❌ Selecting models based on memory of capabilities ❌ Following "best practices" without checking if still current ❌ Any action based on "I think..." or "probably..." for LLM topics ### Research Before Action Checklist Before committing any LLM-related change: - [ ] Searched for latest information on involved models/APIs - [ ] Verified current state vs. training data assumptions - [ ] Checked provider documentation for API syntax - [ ] Confirmed model is not deprecated or superseded - [ ] Validated best practices are still current - [ ] Tested configuration syntax in provider console/playground **Mantra**: "When in doubt about LLM tech, RESEARCH. When certain about LLM tech, STILL RESEARCH." ## Decision Trees ### Model Selection ``` Task type → Find relevant benchmark → Check leaderboards → Test top 3 empirically Coding: SWE-bench | Reasoning: GPQA | General: Arena Elo ``` See: `references/model-selection.md` ### Architecture Complexity ``` 1. Single LLM Call (start here - 80% stop here) 2. Sequential Calls (workflows) 3. LLM + Tools (function calling) 4. Agentic System (LLM controls flow) 5. Multi-Agent (only if truly needed) ``` See: `references/architecture-patterns.md` ### Vector Storage ``` <1M vectors → Postgres pgvector or Convex 1-50M vectors → Postgres with pgvectorscale >50M + <10ms p99 → Dedicated (Qdrant, Weaviate) ``` ## Key Optimizations - **Prompt Caching**: 60-90% cost reduction. Static content first. - **Structured Outputs**: Native JSON Schema. Zero parsing failures. - **Model Routing**: Simple→cheap model, Complex→expensive model. - **Hybrid RAG**: Vector + keyword search = 15-25% better than pure vector. See: `references/prompt-engineering.md`, `references/production-checklist.md` ## Stack Defaults (TypeScript/Next.js) - **SDK**: Vercel AI SDK (streaming, React hooks, provider-agnostic) - **Provider**: OpenRouter (400+ models, easy A/B testing, fallbacks) - **Vectors**: Postgres pgvector (95% of use cases, $20-50/month) - **Observability**: Langfuse (self-hostable, generous free tier) - **Evaluation**: Promptfoo (CI/CD integration, security testing) ## Quality Infrastructure **Production-grade LLM apps need:** 1. **Model Gateway** (OpenRouter, LiteLLM) - Multi-provider access - Fallback chains - Cost routing - See: `llm-gateway-routing` skill 2. **Evaluation & Testing** (Promptfoo) - Regression testing in CI/CD - Security scanning (red team) - Quality gates - See: `llm-evaluation` skill 3. **Production Observability** (Langfuse) - Full trace debugging - Cost tracking - Latency monitoring - See: `langfuse-observability` skill 4. **Quality Audit** - Run `/llm-gates` command to audit your LLM infrastructure - Identifies gaps in routing, testing, observability, security, cost ### Quick Setup ```bash # Evaluation (Promptfoo) npx promptfoo@latest init npx promptfoo@latest eval # Observability (Langfuse) pnpm add langfuse # Sign up at langfuse.com, add keys to .env # Gateway (OpenRouter) # Sign up at openrouter.ai, add OPENROUTER_API_KEY to .env ``` ### Quality Gate Standards | Stage | Checks | Time Budget | |-------|--------|-------------| | Pre-commit | Prompt validation, secrets scan | < 5s | | Pre-push | Regression suite, cost estimate | < 15s | | CI/CD | Full eval, security scan, A/B comparison | < 5 min | | Production | Traces, cost alerts, error monitoring | Continuous | ## Scripts - `scripts/validate_llm_config.py <dir>` - Scan for LLM anti-patterns ## References - `references/model-selection.md` - Leaderboards, search strategies, red flags - `references/prompt-engineering.md` - Caching, structured outputs, CoT, model-specific styles - `references/architecture-patterns.md` - Complexity ladder, RAG, tool use, caching - `references/production-checklist.md` - Cost, errors, security, observability, evaluation ## Related Skills - `llm-evaluation` - Promptfoo setup, CI/CD integration, security testing - `llm-gateway-routing` - OpenRouter, LiteLLM, routing strategies - `langfuse-observability` - Tracing, cost tracking, production debugging ## Related Commands - `/llm-gates` - Audit LLM infrastructure quality across 5 pillars - `/observe` - General observability audit (includes LLM section) ## Live Research Tools **Use these BEFORE relying on training data:** - **WebSearch**: Latest model releases, deprecations, best practices - **Exa MCP** (`mcp__exa__web_search_exa`): Current documentation and code examples - **Gemini CLI** (`gemini`): Sophisticated reasoning with Google Search grounding - **Provider Playgrounds**: OpenRouter, Google AI Studio, Anthropic Console **Research Flow**: 1. WebSearch for latest information 2. Exa MCP for documentation and examples 3. Gemini CLI for complex verification and comparison 4. Provider playground for syntax testing
Signals
Information
- Repository
- phrazzld/claude-config
- Author
- phrazzld
- Last Sync
- 1/22/2026
- Repo Updated
- 1/17/2026
- Created
- 1/13/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
pr-status
PR Status
next-compile
Check Next.js compilation errors via a running dev server. Turbopack only. MANDATORY after every code edit before reporting work complete. Replaces `next build`.
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.