General

investigate - Claude MCP Skill

INVESTIGATE

SEO Guide: Enhance your AI agent with the investigate tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to investigate ... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟1 stars • 1 forks
📥0 downloads

Documentation

SKILL.md
---
description: Investigate production issues with live work log and AI assistance
argument-hint: <bug report - logs, errors, description, screenshots, anything>
---

# INVESTIGATE

You're a senior SRE investigating a production incident.

The user's bug report: **$ARGUMENTS**

## The Codex First-Draft Pattern

**Codex does investigation. You review and verify.**

```bash
codex exec "INVESTIGATE: $ERROR. Check env vars, logs, recent deploys. Report findings." \
  --output-last-message /tmp/codex-investigation.md 2>/dev/null
```

Then review Codex's findings. Don't investigate yourself first.

## Multi-Hypothesis Mode (Agent Teams)

When >2 plausible root causes and single Codex investigation would anchor on one:

1. Create agent team with 3-5 investigators
2. Each teammate gets one hypothesis to prove/disprove
3. Teammates challenge each other's findings via messages
4. Lead synthesizes consensus root cause into incident doc

Use when: ambiguous stack trace, multiple services, flaky failures.
Don't use when: obvious single cause, config issue, simple regression.

## Investigation Protocol

### Rule #1: Config Before Code

External service issues are usually config, not code. Check in this order:

1. **Env vars present?** `npx convex env list --prod | grep <SERVICE>` or `vercel env ls`
2. **Env vars valid?** No trailing whitespace, correct format (sk_*, whsec_*)
3. **Endpoints reachable?** `curl -I -X POST <webhook_url>`
4. **Then** examine code

### Rule #2: Demand Observable Proof

Before declaring "fixed", show:
- Log entry that proves the fix worked
- Metric that changed (e.g., subscription status, webhook delivery)
- Database state that confirms resolution

Mark investigation as **UNVERIFIED** until observables confirm. Never trust "should work" — demand proof.

## Mission

Create a live investigation document (`INCIDENT-{timestamp}.md`) and systematically find root cause.

## Bounded Shell Output (MANDATORY)

- Never dump full production logs blindly
- Start with counts and latest slices (`tail -n 200`)
- For large artifacts, use `~/.claude/scripts/safe-read.sh`
- Add hard bounds (`--limit`, `per_page`, timeout) to external commands

## Your Toolkit

- **Observability**: sentry-cli, npx convex, vercel, whatever this project has
- **Git**: Recent deploys, changes, bisect
- **Gemini CLI**: Web-grounded research, hypothesis generation, similar incident lookup
- **Thinktank**: Multi-model validation when you need a second opinion on hypotheses
- **Config**: Check env vars and configs early - missing config is often the root cause

## The Work Log

Update `INCIDENT-{timestamp}.md` as you go:
- **Timeline**: What happened when (UTC)
- **Evidence**: Logs, metrics, configs checked
- **Hypotheses**: What you think is wrong, ranked by likelihood
- **Actions**: What you tried, what you learned
- **Root cause**: When you find it
- **Fix**: What you did to resolve it

## Root Cause Discipline

For each hypothesis, explicitly categorize:

- **ROOT**: Fixing this removes the fundamental cause
- **SYMPTOM**: Fixing this masks an underlying issue

Prefer investigating root hypotheses first. If you find yourself proposing a symptom fix, ask:

> "What's the underlying architectural issue this symptom reveals?"

**Post-fix question:** "If we revert this change in 6 months, does the problem return?"

## Investigation Philosophy

- **Config before code**: Check env vars and configs before diving into code
- **Hypothesize explicitly**: Write down what you think is wrong before testing
- **Binary search**: Narrow the problem space with each experiment
- **Document as you go**: The work log is for handoff, postmortem, and learning

## When Done

- Root cause documented
- Fix applied (or proposed if too risky)
- Postmortem section completed (what went wrong, lessons, follow-ups)
- Consider if the pattern is worth codifying (regression test, agent update, etc.)

Trust your judgment. You don't need permission for read-only operations. If something doesn't work, try another approach.

## Visual Deliverable

After completing the core workflow, generate a visual HTML summary:

1. Read `~/.claude/skills/visualize/prompts/investigate-timeline.md`
2. Read the template(s) referenced in the prompt
3. Read `~/.claude/skills/visualize/references/css-patterns.md`
4. Generate self-contained HTML capturing this session's output
5. Write to `~/.agent/diagrams/investigate-{incident}-{date}.html`
6. Open in browser: `open ~/.agent/diagrams/investigate-{incident}-{date}.html`
7. Tell the user the file path

Skip visual output if:
- The session was trivial (single finding, quick fix)
- The user explicitly opts out (`--no-visual`)
- No browser available (SSH session)

Signals

Avg rating0.0
Reviews0
Favorites0

Information

Repository
phrazzld/claude-config
Author
phrazzld
Last Sync
3/2/2026
Repo Updated
3/1/2026
Created
1/24/2026

Reviews (0)

No reviews yet. Be the first to review this skill!