General
pg-io-deep-dive - Claude MCP Skill
Deep-dive investigation of PostgreSQL I/O patterns using targeted analysis types (buffer pool, tables, indexes, temp files, checkpoints). Covers pg_stat_io for PG16+, correlates I/O patterns with slow queries, and guides diagnosis of cache misses, temp file spills, and checkpoint storms.
SEO Guide: Enhance your AI agent with the pg-io-deep-dive tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to deep-dive investigation of postgresql i/o patterns using targeted analysis types (buffer pool, table... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# PostgreSQL I/O Deep-Dive
This skill guides a thorough investigation of PostgreSQL I/O performance using the full capabilities of the `analyze_disk_io_patterns` tool, correlating I/O data with query workload and configuration.
## When to Use This Skill
Use this skill when the user:
- Reports high disk I/O or slow disk performance
- Sees queries with high `blks_read` or low cache hit ratios
- Reports `temp_blks_written` in slow queries (work_mem too low)
- Sees frequent checkpoint warnings in PostgreSQL logs
- Has "IO" wait events from `analyze_wait_events`
- Asks "why is my database doing so much disk I/O?"
- Is sizing storage or evaluating SSD vs HDD performance
## MCP Resources Available
- `pgtuner://health/cache` -- Quick cache hit ratio check (lightweight, no full health scan)
- `pgtuner://settings/memory` -- Current memory configuration
- `pgtuner://settings/checkpoint` -- Current checkpoint configuration
- `pgtuner://table/{schema}/{table_name}/stats` -- Table-level I/O statistics
## Related MCP Prompt
This skill extends the I/O analysis portions of the **`health_check`** and **`diagnose_slow_queries`** MCP Prompts.
## Prerequisites
- The pgtuner-mcp server must be connected with a valid `DATABASE_URI`
- PostgreSQL 12+ required. PostgreSQL 16+ recommended for `pg_stat_io` metrics.
## Agent Decision Tree: Where to Start
```
Is the I/O problem...
|
+--> GENERAL (overall slow, high iowait)?
| --> Start with Step 1 (Buffer Pool) then Step 5 (Checkpoints)
|
+--> QUERY-SPECIFIC (one slow query with high blks_read)?
| --> Start with Step 2 (Table I/O) then Step 3 (Index I/O)
|
+--> TEMP-FILE RELATED (sort/hash spilling to disk)?
| --> Start with Step 4 (Temp Files)
|
+--> CHECKPOINT RELATED (log warnings about checkpoints)?
| --> Start with Step 5 (Checkpoints)
|
+--> UNKNOWN?
--> Start with Step 1 (Buffer Pool) for overall picture
```
## I/O Investigation Workflow
### Step 1: Buffer Pool Analysis
```
Tool: analyze_disk_io_patterns
Parameters:
analysis_type: "buffer_pool"
top_n: 20
```
**What this reveals:**
- Overall buffer cache hit ratio (shared buffers effectiveness)
- Buffer allocation rate and eviction patterns
- Backend writes vs checkpointer writes
**Interpret results:**
| Metric | Healthy | Warning | Critical |
|--------|---------|---------|----------|
| Cache hit ratio | > 99% | 95-99% | < 95% |
| Backend writes > 0 | Rare | Occasional | Frequent (checkpointer not keeping up) |
**IF cache hit ratio < 99%:**
```
Tool: review_settings
Parameters:
category: "memory"
```
Check if `shared_buffers` is undersized for the working set.
**IF backend writes > 0:**
Checkpointer/bgwriter is falling behind. Check `bgwriter_lru_maxpages` and checkpoint frequency.
### Step 2: Table I/O Hotspots
```
Tool: analyze_disk_io_patterns
Parameters:
analysis_type: "tables"
schema_name: "public"
top_n: 20
min_size_gb: 0.01
```
**What this reveals:**
- Per-table breakdown of `heap_blks_hit` vs `heap_blks_read`
- Tables causing the most physical disk reads
- Tables with poor cache hit ratios
**Agent reasoning:**
- Tables with high `heap_blks_read` relative to `heap_blks_hit`: Data is not fitting in shared_buffers
- Very large tables with mostly sequential access: Consider partitioning or BRIN indexes
- Tables with high `idx_blks_read`: Index is too large for buffer cache
**Cross-reference with table stats:**
For the top I/O-heavy tables:
```
Tool: get_table_stats
Parameters:
table_name: "<hot_table>"
schema_name: "public"
include_indexes: true
order_by: "seq_scans"
```
IF `seq_scan` count is very high AND the table is large, the table likely needs better indexes to avoid full scans.
### Step 3: Index I/O Patterns
```
Tool: analyze_disk_io_patterns
Parameters:
analysis_type: "indexes"
top_n: 20
min_size_gb: 0.01
```
**What this reveals:**
- Per-index physical read rates
- Indexes that are too large for the buffer cache
- Indexes that are read-heavy but rarely hit in cache
**Agent reasoning:**
- Bloated indexes have more pages to read. Cross-check:
```
Tool: analyze_index_bloat
Parameters:
table_name: "<table_with_hot_index>"
schema_name: "public"
min_index_size_gb: 0.01
min_bloat_percent: 20
```
- Indexes on rarely-accessed columns waste buffer cache space. Cross-check with `find_unused_indexes`.
### Step 4: Temp File Analysis
```
Tool: analyze_disk_io_patterns
Parameters:
analysis_type: "temp_files"
top_n: 20
```
**What this reveals:**
- Total temp files created and their aggregate size
- Whether sorts and hash joins are spilling to disk
**Correlate with slow queries:**
```
Tool: get_slow_queries
Parameters:
limit: 20
min_calls: 1
order_by: "mean_time"
```
Look for queries with `temp_blks_read > 0` or `temp_blks_written > 0` in the results. These queries are the ones spilling to disk.
**Then analyze the worst offender:**
```
Tool: analyze_query
Parameters:
query: "<query_with_temp_blks>"
analyze: true
buffers: true
settings: true
format: "text"
```
The `settings: true` parameter reveals if a session-level `SET work_mem` is in effect. The `buffers: true` shows temp buffer usage in the plan.
**Look for in the plan:**
- `Sort Method: external merge Disk: XXkB` -> `work_mem` too low for this sort
- `Hash Batches: 16` -> Hash table spilling, increase `work_mem`
- `Buffers: temp read=XX written=XX` -> Direct confirmation of temp file usage
**Fix:**
```sql
-- Session-level (test first):
SET work_mem = '256MB';
-- Then re-run the query to verify temp files are eliminated.
-- Global (if many queries affected):
ALTER SYSTEM SET work_mem = '128MB';
SELECT pg_reload_conf();
```
### Step 5: Checkpoint I/O Analysis
```
Tool: analyze_disk_io_patterns
Parameters:
analysis_type: "checkpoints"
top_n: 10
```
**What this reveals:**
- Timed checkpoints vs requested checkpoints ratio
- Checkpoint write time and sync time
- Buffers written per checkpoint
**Interpret:**
| Metric | Healthy | Problem |
|--------|---------|---------|
| Requested checkpoints | < 10% of total | > 20% (max_wal_size too small) |
| Checkpoint write time | < 30 seconds | > 60 seconds (disk too slow or too much dirty data) |
| Sync time | < 5 seconds | > 10 seconds (fsync bottleneck) |
**IF requested checkpoints > 20%:**
```
Tool: review_settings
Parameters:
category: "checkpoint"
```
Recommend increasing `max_wal_size` (e.g., from 1GB to 4GB or 8GB).
### Step 6: pg_stat_io Analysis (PostgreSQL 16+ Only)
The `analyze_disk_io_patterns` tool with `analysis_type: "all"` automatically includes `pg_stat_io` data when running on PostgreSQL 16+. This provides:
- I/O statistics broken down by backend type (client backend, autovacuum, checkpointer, etc.)
- Read/write/extend/fsync counts per object type (relation, temp relation, WAL)
- Hit rates per context (normal, vacuum, bulkread, bulkwrite)
**Agent decision:** If the PostgreSQL version is < 16, the `pg_stat_io` section will show a message saying it's not available. Skip this section and rely on the other analysis types.
**What to look for in pg_stat_io:**
- High `reads` in `bulkread` context: Sequential scans pulling large data volumes
- High `extends` on temp relations: Confirm temp file spill issues (Step 4)
- Autovacuum backend with high reads: Vacuum is scanning cold data, confirm with `monitor_vacuum_progress`
## Output Format
### I/O Analysis Report
**Buffer Pool:**
- Cache hit ratio: XX.X%
- Status: Healthy / Warning / Critical
- Recommendation: (if needed)
**Table I/O Hotspots:**
| Table | Size | Reads | Hits | Hit Ratio | Issue |
|-------|------|-------|------|-----------|-------|
| orders | 12GB | 450K | 2.1M | 82.4% | Low hit ratio -- undersized shared_buffers or missing index |
**Temp File Impact:**
- Total temp files since reset: X
- Total temp bytes: X GB
- Affected queries: list with `work_mem` recommendations
**Checkpoint Performance:**
- Requested: X% / Timed: X%
- Avg write time: X sec
- Recommendation: (if needed)
**Action Items (prioritized):**
1. ...
2. ...
## Iterative Verification
After applying any I/O-related fix:
1. **After increasing `shared_buffers`** (requires restart): Wait for cache to warm up (30+ minutes under normal workload), then re-run `analysis_type: "buffer_pool"`.
2. **After increasing `work_mem`**: Re-run the offending query with `analyze_query` (buffers: true) to confirm no more temp files.
3. **After increasing `max_wal_size`**: Monitor checkpoint frequency over 24 hours, then re-run `analysis_type: "checkpoints"`.
## When to Stop and Ask the User
- **If cache hit ratio is fine (> 99%) but queries are still slow**: The problem is not I/O. Redirect to `pg-slow-query-diagnosis` skill.
- **If temp file spill is caused by a single massive query**: Ask the user if increasing `work_mem` globally is acceptable, or suggest session-level `SET` for that specific query.
- **If the server is on HDD and I/O is saturated**: Ask "Is migrating to SSD an option? Most I/O tuning has limited effect on saturated spinning disks."
## PostgreSQL Version Notes
| Feature | Version | Notes |
|---------|---------|-------|
| `pg_stat_io` view | PG16+ | Detailed per-backend I/O statistics |
| `wal_compression = lz4` | PG15+ | Reduces WAL I/O |
| `BUFFERS` in EXPLAIN | PG9.4+ | Universally available |
| `checkpoint_flush_after` | PG9.6+ | Limit OS page cache dirty page accumulation |
## Production Safety
- All `analyze_disk_io_patterns` calls are read-only. No data is modified.
- `analyze_query` with `analyze: true` executes the query. Use caution with write queries.
- `review_settings` is read-only.
- I/O statistics are cumulative since server start or last reset. Short uptime means less meaningful data.Signals
Information
- Repository
- isdaniel/pgtuner_mcp
- Author
- isdaniel
- Last Sync
- 5/10/2026
- Repo Updated
- 4/29/2026
- Created
- 3/26/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
cloud
Documentation reference for using Browser Use Cloud — the hosted API and SDK for browser automation. Use this skill whenever the user needs help with the Cloud REST API (v2 or v3), browser-use-sdk (Python or TypeScript), X-Browser-Use-API-Key authentication, cloud sessions, browser profiles, profile sync, CDP WebSocket connections, stealth browsers, residential proxies, CAPTCHA handling, webhooks, workspaces, skills marketplace, liveUrl streaming, pricing, or integration patterns (chat UI, subagent, adding browser tools to existing agents). Also trigger for questions about n8n/Make/Zapier integration, Playwright/ Puppeteer/Selenium on cloud infrastructure, or 1Password vault integration. Do NOT use this for the open-source Python library (Agent, Browser, Tools config) — use the open-source skill instead.
Related Guides
Mastering the Oracle CLI: A Complete Guide to the Claude Skill for Database Professionals
Learn how to use the oracle Claude skill. Complete guide with installation instructions and examples.
Python Django Best Practices: A Comprehensive Guide to the Claude Skill
Learn how to use the python django best practices Claude skill. Complete guide with installation instructions and examples.
Optimize Rell Blockchain Code: A Comprehensive Guide to the Claude Skill
Learn how to use the optimize rell blockchain code Claude skill. Complete guide with installation instructions and examples.