Data & AI
ai-llm-development - Claude MCP Skill
MANDATORY invocation for ALL LLM-related work. Invoke immediately when: - ANY mention of model names, IDs, or versions - ANY configuration of AI providers or APIs - ANY defaults/constants for LLM settings - ANY prompt engineering or modification - ANY discussion of model capabilities or features - ANY changes to AI-related dependencies - Reading/writing .env files with AI config - Modifying aiProviders.ts, prompts.ts, or similar - Reviewing AI-related pull requests - Debugging LLM integration issues CRITICAL: Training data lags reality by months. ALWAYS research first. Use WebSearch, Exa MCP, or Gemini CLI before making ANY LLM decisions.
SEO Guide: Enhance your AI agent with the ai-llm-development tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to mandatory invocation for all llm-related work. invoke immediately when: - any mention of model names... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# AI/LLM Development ## Core Philosophy **Context Engineering > Prompt Engineering**: Optimize entire LLM configuration, not just wording. **Simplicity First**: 80% of use cases need single LLM call, not multi-agent systems. **Currency Over Memory**: Models deprecate in 6-12 months. Learn to find current ones via leaderboards. **Empiricism**: Benchmarks guide; YOUR data decides. Test top 3-5 models with your prompts. ## RESEARCH FIRST PROTOCOL **CRITICAL**: Your training data is ALWAYS stale for LLM work. The field changes weekly. ### Before ANY LLM-Related Action 1. **Identify what you're assuming**: Model capabilities? API syntax? Best practices? 2. **Research using live tools** (in order of preference): - WebSearch: "latest [model/provider] models" - Exa MCP: Get current documentation and examples - Gemini CLI: Verify against latest information with web grounding 3. **Verify your assumptions**: Don't trust training data for: - Model names and versions (new models release monthly) - API syntax and parameters (providers update frequently) - Best practices and recommendations (evolve constantly) - Pricing and limits (change without notice) - Deprecation status (models removed regularly) ### Research Query Templates **Model Selection**: - "latest [provider] models" - "[model-name] release date and capabilities" - "is [model-name] deprecated or superseded" - "[provider] newest models announced" **API Syntax**: - "[provider] API documentation [specific-feature]" - "[sdk-name] current version and usage" - "OpenRouter model ID format current" **Best Practices**: - "[task] LLM best practices latest" - "current recommendations for [architecture pattern]" - "[framework] latest patterns and examples" ### Red Flags That Trigger Mandatory Research ❌ Making assumptions about version numbers (3.0 vs 2.5 doesn't mean newer) ❌ Changing model defaults without verification ❌ Assuming API syntax from training data ❌ Selecting models based on memory of capabilities ❌ Following "best practices" without checking if still current ❌ Any action based on "I think..." or "probably..." for LLM topics ### Research Before Action Checklist Before committing any LLM-related change: - [ ] Searched for latest information on involved models/APIs - [ ] Verified current state vs. training data assumptions - [ ] Checked provider documentation for API syntax - [ ] Confirmed model is not deprecated or superseded - [ ] Validated best practices are still current - [ ] Tested configuration syntax in provider console/playground **Mantra**: "When in doubt about LLM tech, RESEARCH. When certain about LLM tech, STILL RESEARCH." ## Decision Trees ### Model Selection ``` Task type → Find relevant benchmark → Check leaderboards → Test top 3 empirically Coding: SWE-bench | Reasoning: GPQA | General: Arena Elo ``` See: `references/model-selection.md` ### Architecture Complexity ``` 1. Single LLM Call (start here - 80% stop here) 2. Sequential Calls (workflows) 3. LLM + Tools (function calling) 4. Agentic System (LLM controls flow) 5. Multi-Agent (only if truly needed) ``` See: `references/architecture-patterns.md` ### Vector Storage ``` <1M vectors → Postgres pgvector or Convex 1-50M vectors → Postgres with pgvectorscale >50M + <10ms p99 → Dedicated (Qdrant, Weaviate) ``` ## Key Optimizations - **Prompt Caching**: 60-90% cost reduction. Static content first. - **Structured Outputs**: Native JSON Schema. Zero parsing failures. - **Model Routing**: Simple→cheap model, Complex→expensive model. - **Hybrid RAG**: Vector + keyword search = 15-25% better than pure vector. See: `references/prompt-engineering.md`, `references/production-checklist.md` ## Stack Defaults (TypeScript/Next.js) - **SDK**: Vercel AI SDK (streaming, React hooks, provider-agnostic) - **Provider**: OpenRouter (400+ models, easy A/B testing, fallbacks) - **Vectors**: Postgres pgvector (95% of use cases, $20-50/month) - **Observability**: Langfuse (self-hostable, generous free tier) - **Evaluation**: Promptfoo (CI/CD integration, security testing) ## Quality Infrastructure **Production-grade LLM apps need:** 1. **Model Gateway** (OpenRouter, LiteLLM) - Multi-provider access - Fallback chains - Cost routing - See: `llm-gateway-routing` skill 2. **Evaluation & Testing** (Promptfoo) - Regression testing in CI/CD - Security scanning (red team) - Quality gates - See: `llm-evaluation` skill 3. **Production Observability** (Langfuse) - Full trace debugging - Cost tracking - Latency monitoring - See: `langfuse-observability` skill 4. **Quality Audit** - Run `/llm-gates` command to audit your LLM infrastructure - Identifies gaps in routing, testing, observability, security, cost ### Quick Setup ```bash # Evaluation (Promptfoo) npx promptfoo@latest init npx promptfoo@latest eval # Observability (Langfuse) pnpm add langfuse # Sign up at langfuse.com, add keys to .env # Gateway (OpenRouter) # Sign up at openrouter.ai, add OPENROUTER_API_KEY to .env ``` ### Quality Gate Standards | Stage | Checks | Time Budget | |-------|--------|-------------| | Pre-commit | Prompt validation, secrets scan | < 5s | | Pre-push | Regression suite, cost estimate | < 15s | | CI/CD | Full eval, security scan, A/B comparison | < 5 min | | Production | Traces, cost alerts, error monitoring | Continuous | ## Scripts - `scripts/validate_llm_config.py <dir>` - Scan for LLM anti-patterns ## References - `references/model-selection.md` - Leaderboards, search strategies, red flags - `references/prompt-engineering.md` - Caching, structured outputs, CoT, model-specific styles - `references/architecture-patterns.md` - Complexity ladder, RAG, tool use, caching - `references/production-checklist.md` - Cost, errors, security, observability, evaluation ## Related Skills - `llm-evaluation` - Promptfoo setup, CI/CD integration, security testing - `llm-gateway-routing` - OpenRouter, LiteLLM, routing strategies - `langfuse-observability` - Tracing, cost tracking, production debugging ## Related Commands - `/llm-gates` - Audit LLM infrastructure quality across 5 pillars - `/observe` - General observability audit (includes LLM section) ## Live Research Tools **Use these BEFORE relying on training data:** - **WebSearch**: Latest model releases, deprecations, best practices - **Exa MCP** (`mcp__exa__web_search_exa`): Current documentation and code examples - **Gemini CLI** (`gemini`): Sophisticated reasoning with Google Search grounding - **Provider Playgrounds**: OpenRouter, Google AI Studio, Anthropic Console **Research Flow**: 1. WebSearch for latest information 2. Exa MCP for documentation and examples 3. Gemini CLI for complex verification and comparison 4. Provider playground for syntax testing
Signals
Information
- Repository
- phrazzld/claude-config
- Author
- phrazzld
- Last Sync
- 1/22/2026
- Repo Updated
- 1/17/2026
- Created
- 1/13/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
pr-status
PR Status
mem0
Integrate Mem0 Platform into AI applications for persistent memory, personalization, and semantic search. Use this skill when the user mentions "mem0", "memory layer", "remember user preferences", "persistent context", "personalization", or needs to add long-term memory to chatbots, agents, or AI apps. Covers Python and TypeScript SDKs, framework integrations (LangChain, CrewAI, Vercel AI SDK, OpenAI Agents SDK, Pipecat), and the full Platform API. Use even when the user doesn't explicitly say "mem0" but describes needing conversation memory, user context retention, or knowledge retrieval across sessions.
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.