General

omni-vu - Claude MCP Skill

Screen capture, AI vision analysis, and GUI automation for macOS. Use when you need to see what's on screen, analyze UI state, detect changes, or automate mouse/keyboard actions.

SEO Guide: Enhance your AI agent with the omni-vu tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to screen capture, ai vision analysis, and gui automation for macos. use when you need to see what's on... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟1 stars • 4 forks

📥0 downloads

Python API

View on GitHub🔗 Claude Servers

Documentation

SKILL.md

# omni.vu - Visual Understanding & Automation

## Overview

omni.vu gives you eyes and hands on the user's macOS screen. Use it to:
- **See** what the user sees (screen capture)
- **Understand** UI state with AI vision
- **Detect** changes and wait for events
- **Act** with mouse and keyboard automation

## When to Use This Skill

### Proactively Use omni.vu When:
1. **Debugging UI issues** - "The button isn't working" → capture and analyze
2. **Verifying changes** - After modifying UI code, check if it rendered correctly
3. **Waiting for operations** - Build finishing, deployment completing, tests running
4. **Understanding context** - User describes something on screen you can't see
5. **Automating repetitive tasks** - Clicking through UI flows, filling forms
6. **Documentation** - Capturing screenshots of features

### Do NOT Use When:
- Reading/writing files (use Read/Write tools)
- Running terminal commands (use Bash)
- Making API calls (use appropriate tools)
- User hasn't granted screen recording permission

## Tool Reference

### Capture Tools

#### `vu` - Full Screen Capture
```
vu(monitor=0, save_to_history=True)
```
Captures the entire screen. Returns base64 image.

**Use when**: You need to see everything on screen.

#### `vu_window` - Window Capture
```
vu_window(window_id=None, title="VS Code", include_frame=False)
```
Captures a specific window by ID or title (partial match).

**Use when**: You only need one application's content.

**Workflow**:
1. Call `vu_list_windows` to see available windows
2. Find the window_id or use title matching
3. Call `vu_window` with that ID/title

#### `vu_region` - Region Capture
```
vu_region(x=100, y=100, width=500, height=300)
```
Captures a specific rectangle of the screen.

**Use when**: You need a precise area (error message, specific component).

#### `vu_list_windows` - List Windows
```
vu_list_windows(filter_app="Chrome")
```
Lists all visible windows with metadata.

**Returns**: List of `{window_id, title, owner, x, y, width, height}`

#### `vu_list_monitors` - List Displays
```
vu_list_monitors()
```
Lists connected displays with resolution and scale factor.

### Vision Tools

#### `vu_describe` - AI Vision Analysis
```
vu_describe(
    prompt="What errors are visible?",
    provider="claude",  # or openai, gemini, ollama
    max_tokens=1024,
    capture_first=True
)
```
Captures screen and analyzes with AI.

**Best prompts**:
- "Describe what you see on screen"
- "Are there any error messages visible?"
- "What is the state of the build/test output?"
- "Is the login form filled correctly?"
- "What color is the status indicator?"

**Providers**:
| Provider | Speed | Quality | Cost |
|----------|-------|---------|------|
| claude | Medium | Excellent | $$ |
| openai | Fast | Very Good | $$ |
| gemini | Fast | Good | $ |
| ollama | Slow | Varies | Free |

### Utility Tools

#### `vu_diff` - Change Detection
```
vu_diff(threshold=0.02, monitor=0)
```
Detects if screen changed since last capture.

**Returns**: `{changed: bool, diff_percentage: float}`

**Use for polling patterns**:
```python
# Wait for build to finish
while True:
    result = vu_diff(threshold=0.05)
    if result["changed"]:
        # Screen changed, check what happened
        analysis = vu_describe(prompt="Did the build succeed or fail?")
        break
    # Wait before checking again
```

#### `vu_history` - Capture History
```
vu_history(limit=10, capture_type="full_screen")
```
Gets recent capture metadata.

#### `vu_last` - Last Capture
```
vu_last()
```
Returns the most recent capture with image data.

**Use when**: You need to re-analyze without re-capturing.

#### `vu_status` - System Status
```
vu_status()
```
Returns system info: monitors, providers, safety settings.

### Automation Tools

#### `vu_click` - Mouse Click
```
vu_click(x=500, y=300, button="left", count=1)
```
Clicks at screen coordinates.

**Buttons**: `left`, `right`, `middle`
**Count**: 1 (single), 2 (double), 3 (triple)

**Safety**: Coordinates validated against screen bounds.

#### `vu_move` - Move Cursor
```
vu_move(x=500, y=300, duration_ms=0)
```
Moves cursor to position. Use `duration_ms` for animated movement.

#### `vu_drag` - Drag Operation
```
vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)
```
Drags from start to end position.

**Safety**: Maximum 2000px drag distance.

#### `vu_scroll` - Scroll
```
vu_scroll(direction="down", amount=3, x=None, y=None)
```
Scrolls at current or specified position.

**Directions**: `up`, `down`, `left`, `right`
**Amount**: Lines to scroll (1-20)

#### `vu_type` - Type Text
```
vu_type(text="Hello World", delay_between_ms=0)
```
Types text character by character.

**Note**: Click on target field first!

#### `vu_hotkey` - Keyboard Shortcut
```
vu_hotkey(keys="cmd+s")
```
Executes keyboard shortcuts.

**Format**: `modifier+key` (e.g., `cmd+c`, `ctrl+shift+s`, `cmd+opt+i`)
**Modifiers**: `cmd`, `ctrl`, `alt`/`opt`, `shift`

**Blocked hotkeys** (for safety):
- `cmd+q` (quit)
- `cmd+shift+q` (logout)
- `cmd+opt+esc` (force quit)

## Common Workflows

### 1. Debug UI Issue
```
User: "The submit button doesn't do anything when I click it"

1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")
```

### 2. Verify Build/Deploy
```
User: "Run the build and let me know when it's done"

1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report result
```

### 3. Fill a Form
```
User: "Fill out the registration form with test data"

1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y)  # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab")  # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y)  # Submit
8. vu_describe(prompt="Was the form submitted successfully?")
```

### 4. Navigate UI
```
User: "Open the settings panel in VS Code"

1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code")  # Capture VS Code
3. vu_hotkey(keys="cmd+,")  # Open settings
4. vu_diff()  # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")
```

### 5. Monitor for Changes
```
User: "Watch the dashboard and tell me when the status changes"

1. vu()  # Initial capture
2. Loop every 5 seconds:
   - vu_diff(threshold=0.02)
   - If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to user
```

## Best Practices

### 1. Capture Before Acting
Always capture and analyze before clicking/typing:
```
❌ Bad: vu_click(x=500, y=300)  # Hope it's the right spot

✅ Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)
```

### 2. Use Appropriate Capture Scope
```
Full screen → General context, multi-app workflows
Window → Single app focus, cleaner output
Region → Specific element, faster/smaller
```

### 3. Specific Vision Prompts
```
❌ Vague: "What do you see?"

✅ Specific:
- "Is there an error message visible? If so, what does it say?"
- "What is the status of the test runner? Passing, failing, or running?"
- "List all form fields visible and their current values"
```

### 4. Rate Limit Automation
Don't spam clicks. Wait between actions:
```
vu_click(x=100, y=100)
# Wait for UI response
vu_diff(threshold=0.01)
vu_click(x=200, y=200)
```

### 5. Verify After Actions
Always verify automation succeeded:
```
vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")
```

## Troubleshooting

### "Permission denied" or blank captures
→ Grant Screen Recording permission: System Settings > Privacy > Screen Recording

### Automation not working
→ Grant Accessibility permission: System Settings > Privacy > Accessibility

### Vision analysis failing
→ Check API keys in environment variables
→ Try different provider: `vu_describe(provider="openai")`

### Coordinates seem off
→ Retina display? Coordinates are in logical points, not pixels
→ Multi-monitor? Check `vu_list_monitors()` for display arrangement

### Hotkey blocked
→ Some dangerous hotkeys (cmd+q) are blocked for safety
→ Unblock in config if absolutely necessary

## Configuration

Environment variables:
```bash
OMNI_VU_ANTHROPIC_API_KEY=sk-ant-...    # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-...           # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude          # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium              # low/medium/high
```

History stored at: `~/.omni.vu/captures/`

Signals

Avg rating⭐ 0.0

Reviews0

Favorites0

Information

Repository: eddiebe147/claude-settings
Author: eddiebe147
Last Sync: 2/4/2026
Repo Updated: 2/4/2026
Created: 1/13/2026

Reviews (0)

No reviews yet. Be the first to review this skill!

Related Skills

upgrade-nodejs

Upgrading Bun's Self-Reported Node.js Version

⭐ 44899Has guide

cursorrules

CrewAI Development Rules

⭐ 43932Has guide

cn-check

Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.

⭐ 33067

CLAUDE

CLAUDE.md

⭐ 27972