DevOps & Infra
check-production - Claude MCP Skill
Check production health: Sentry errors, Vercel logs, health endpoints, GitHub CI/CD. Outputs structured findings. Use log-production-issues to create issues. Invoke for: production diagnostics, error audit, health status, CI failures.
SEO Guide: Enhance your AI agent with the check-production tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to check production health: sentry errors, vercel logs, health endpoints, github ci/cd. outputs structu... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# /check-production
Audit production health. Output findings as structured report.
## What This Does
1. Query Sentry for unresolved issues
2. Check Vercel logs for recent errors
3. Test health endpoints
4. Check GitHub Actions for CI/CD failures
5. Output prioritized findings (P0-P3)
**This is a primitive.** It only investigates and reports. Use `/log-production-issues` to create GitHub issues or `/triage` to fix.
## Process
### 1. Sentry Check
```bash
# Run triage script if available
~/.claude/skills/triage/scripts/check_sentry.sh 2>/dev/null || echo "Sentry check unavailable"
```
Or spawn Sentry MCP query if configured.
### 2. Vercel Logs Check
```bash
# Check for recent errors
~/.claude/skills/triage/scripts/check_vercel_logs.sh 2>/dev/null || vercel logs --output json 2>/dev/null | head -50
```
### 3. Health Endpoints
```bash
# Test health endpoint
~/.claude/skills/triage/scripts/check_health_endpoints.sh 2>/dev/null || curl -sf "$(grep NEXT_PUBLIC_APP_URL .env.local 2>/dev/null | cut -d= -f2)/api/health" | jq .
```
### 4. GitHub CI/CD Check
```bash
# Check for failed workflow runs on default branch
gh run list --branch main --status failure --limit 5 2>/dev/null || \
gh run list --branch master --status failure --limit 5 2>/dev/null
# Get details on most recent failure
gh run list --status failure --limit 1 --json databaseId,name,conclusion,createdAt,headBranch 2>/dev/null
# Check for stale/stuck workflows
gh run list --status in_progress --json databaseId,name,createdAt 2>/dev/null
```
**What to look for:**
- Failed runs on main/master branch (broken CI)
- Failed runs on feature branches blocking PRs
- Stuck/in-progress runs that should have completed
- Patterns in failure types (tests, lint, build, deploy)
### 5. Quick Application Checks
```bash
# Check for error handling gaps
grep -rE "catch\s*\(\s*\)" --include="*.ts" --include="*.tsx" src/ app/ 2>/dev/null | head -5
# Empty catch blocks = silent failures
```
## Output Format
```markdown
## Production Health Check
### P0: Critical (Active Production Issues)
- [SENTRY-123] PaymentIntent failed - 23 users affected (Score: 147)
Location: api/checkout.ts:45
First seen: 2h ago
### P1: High (Degraded Performance / Broken CI)
- Health endpoint slow: /api/health responding in 2.3s (should be <500ms)
- Vercel logs show 5xx errors in last hour (count: 12)
- [CI] Main branch failing: "Build" workflow (run #1234)
Failed step: "Type check"
Error: Type 'string' is not assignable to type 'number'
### P2: Medium (Warnings)
- 3 empty catch blocks found (silent failures)
- Health endpoint missing database connectivity check
- [CI] 3 feature branch workflows failing (blocking PRs)
### P3: Low (Improvements)
- Consider adding Sentry performance monitoring
- Health endpoint could include more service checks
## Summary
- P0: 1 | P1: 3 | P2: 3 | P3: 2
- Recommendation: Fix P0 immediately, then fix main branch CI
```
## Priority Mapping
| Signal | Priority |
|--------|----------|
| Active errors affecting users | P0 |
| 5xx errors, slow responses | P1 |
| Main branch CI/CD failing | P1 |
| Feature branch CI blocking PRs | P2 |
| Silent failures, missing checks | P2 |
| Missing monitoring, improvements | P3 |
## Health Endpoint Anti-Pattern
**Health checks that lie are worse than no health check.** Example:
```typescript
// ❌ BAD: Reports "ok" without checking
return { status: "ok", services: { database: "ok" } };
// ✅ GOOD: Honest liveness probe (no fake service status)
return { status: "ok", timestamp: new Date().toISOString() };
// ✅ BETTER: Real readiness probe
const dbStatus = await checkDatabase() ? "ok" : "error";
return { status: dbStatus === "ok" ? "ok" : "degraded", services: { database: dbStatus } };
```
If you can't verify a service, don't report on it. False "ok" status masks outages.
## Analytics Note
This skill checks production health (errors, logs, endpoints), not product analytics.
For analytics auditing, see `/check-observability`. Note:
- **PostHog** is REQUIRED for product analytics (has MCP server)
- **Vercel Analytics** is NOT acceptable (no CLI/API/MCP - unusable for our workflow)
If you need to investigate user behavior or funnels during incident response, query PostHog via MCP.
### 6. E2E Smoke Check
If Playwright is configured in the project:
```bash
# Run smoke tests against production
PLAYWRIGHT_BASE_URL="$PROD_URL" npx playwright test e2e/smoke.spec.ts --reporter=list 2>&1 | head -30
```
Critical paths to verify:
- Landing page loads (anonymous)
- Dashboard loads (authenticated) — the #1 incident class
- Subscribe page renders
- Session page loads
- No error boundaries triggered on any route
### 7. Post-Deploy Health Check
```bash
# Verify health endpoint
curl -sf "$PROD_URL/api/health" -w "\nHTTP %{http_code} in %{time_total}s\n" | head -5
# Verify no error boundary on dashboard (check for error text in HTML)
curl -sf "$PROD_URL/dashboard" 2>/dev/null | grep -c "Something went wrong" && echo "ERROR BOUNDARY DETECTED" || echo "Dashboard OK"
```
## Related
- `/log-production-issues` - Create GitHub issues from findings
- `/triage` - Fix production issues
- `/observability` - Set up monitoring infrastructure
- `/flywheel-qa` - Agentic QA for preview deploymentsSignals
Information
- Repository
- phrazzld/claude-config
- Author
- phrazzld
- Last Sync
- 3/2/2026
- Repo Updated
- 3/1/2026
- Created
- 1/25/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
mem0
Integrate Mem0 Platform into AI applications for persistent memory, personalization, and semantic search. Use this skill when the user mentions "mem0", "memory layer", "remember user preferences", "persistent context", "personalization", or needs to add long-term memory to chatbots, agents, or AI apps. Covers Python and TypeScript SDKs, framework integrations (LangChain, CrewAI, Vercel AI SDK, OpenAI Agents SDK, Pipecat), and the full Platform API. Use even when the user doesn't explicitly say "mem0" but describes needing conversation memory, user context retention, or knowledge retrieval across sessions.
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.