General
omni-vu - Claude MCP Skill
Screen capture, AI vision analysis, and GUI automation for macOS. Use when you need to see what's on screen, analyze UI state, detect changes, or automate mouse/keyboard actions.
SEO Guide: Enhance your AI agent with the omni-vu tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to screen capture, ai vision analysis, and gui automation for macos. use when you need to see what's on... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# omni.vu - Visual Understanding & Automation
## Overview
omni.vu gives you eyes and hands on the user's macOS screen. Use it to:
- **See** what the user sees (screen capture)
- **Understand** UI state with AI vision
- **Detect** changes and wait for events
- **Act** with mouse and keyboard automation
## When to Use This Skill
### Proactively Use omni.vu When:
1. **Debugging UI issues** - "The button isn't working" ā capture and analyze
2. **Verifying changes** - After modifying UI code, check if it rendered correctly
3. **Waiting for operations** - Build finishing, deployment completing, tests running
4. **Understanding context** - User describes something on screen you can't see
5. **Automating repetitive tasks** - Clicking through UI flows, filling forms
6. **Documentation** - Capturing screenshots of features
### Do NOT Use When:
- Reading/writing files (use Read/Write tools)
- Running terminal commands (use Bash)
- Making API calls (use appropriate tools)
- User hasn't granted screen recording permission
## Tool Reference
### Capture Tools
#### `vu` - Full Screen Capture
```
vu(monitor=0, save_to_history=True)
```
Captures the entire screen. Returns base64 image.
**Use when**: You need to see everything on screen.
#### `vu_window` - Window Capture
```
vu_window(window_id=None, title="VS Code", include_frame=False)
```
Captures a specific window by ID or title (partial match).
**Use when**: You only need one application's content.
**Workflow**:
1. Call `vu_list_windows` to see available windows
2. Find the window_id or use title matching
3. Call `vu_window` with that ID/title
#### `vu_region` - Region Capture
```
vu_region(x=100, y=100, width=500, height=300)
```
Captures a specific rectangle of the screen.
**Use when**: You need a precise area (error message, specific component).
#### `vu_list_windows` - List Windows
```
vu_list_windows(filter_app="Chrome")
```
Lists all visible windows with metadata.
**Returns**: List of `{window_id, title, owner, x, y, width, height}`
#### `vu_list_monitors` - List Displays
```
vu_list_monitors()
```
Lists connected displays with resolution and scale factor.
### Vision Tools
#### `vu_describe` - AI Vision Analysis
```
vu_describe(
prompt="What errors are visible?",
provider="claude", # or openai, gemini, ollama
max_tokens=1024,
capture_first=True
)
```
Captures screen and analyzes with AI.
**Best prompts**:
- "Describe what you see on screen"
- "Are there any error messages visible?"
- "What is the state of the build/test output?"
- "Is the login form filled correctly?"
- "What color is the status indicator?"
**Providers**:
| Provider | Speed | Quality | Cost |
|----------|-------|---------|------|
| claude | Medium | Excellent | $$ |
| openai | Fast | Very Good | $$ |
| gemini | Fast | Good | $ |
| ollama | Slow | Varies | Free |
### Utility Tools
#### `vu_diff` - Change Detection
```
vu_diff(threshold=0.02, monitor=0)
```
Detects if screen changed since last capture.
**Returns**: `{changed: bool, diff_percentage: float}`
**Use for polling patterns**:
```python
# Wait for build to finish
while True:
result = vu_diff(threshold=0.05)
if result["changed"]:
# Screen changed, check what happened
analysis = vu_describe(prompt="Did the build succeed or fail?")
break
# Wait before checking again
```
#### `vu_history` - Capture History
```
vu_history(limit=10, capture_type="full_screen")
```
Gets recent capture metadata.
#### `vu_last` - Last Capture
```
vu_last()
```
Returns the most recent capture with image data.
**Use when**: You need to re-analyze without re-capturing.
#### `vu_status` - System Status
```
vu_status()
```
Returns system info: monitors, providers, safety settings.
### Automation Tools
#### `vu_click` - Mouse Click
```
vu_click(x=500, y=300, button="left", count=1)
```
Clicks at screen coordinates.
**Buttons**: `left`, `right`, `middle`
**Count**: 1 (single), 2 (double), 3 (triple)
**Safety**: Coordinates validated against screen bounds.
#### `vu_move` - Move Cursor
```
vu_move(x=500, y=300, duration_ms=0)
```
Moves cursor to position. Use `duration_ms` for animated movement.
#### `vu_drag` - Drag Operation
```
vu_drag(start_x=100, start_y=100, end_x=500, end_y=300, duration_ms=500)
```
Drags from start to end position.
**Safety**: Maximum 2000px drag distance.
#### `vu_scroll` - Scroll
```
vu_scroll(direction="down", amount=3, x=None, y=None)
```
Scrolls at current or specified position.
**Directions**: `up`, `down`, `left`, `right`
**Amount**: Lines to scroll (1-20)
#### `vu_type` - Type Text
```
vu_type(text="Hello World", delay_between_ms=0)
```
Types text character by character.
**Note**: Click on target field first!
#### `vu_hotkey` - Keyboard Shortcut
```
vu_hotkey(keys="cmd+s")
```
Executes keyboard shortcuts.
**Format**: `modifier+key` (e.g., `cmd+c`, `ctrl+shift+s`, `cmd+opt+i`)
**Modifiers**: `cmd`, `ctrl`, `alt`/`opt`, `shift`
**Blocked hotkeys** (for safety):
- `cmd+q` (quit)
- `cmd+shift+q` (logout)
- `cmd+opt+esc` (force quit)
## Common Workflows
### 1. Debug UI Issue
```
User: "The submit button doesn't do anything when I click it"
1. vu_describe(prompt="Describe the form and submit button state")
2. Analyze: Is button disabled? Is there validation error?
3. If needed: vu_window(title="Chrome") for focused capture
4. Check console: vu_describe(prompt="Are there any errors in the developer console?")
```
### 2. Verify Build/Deploy
```
User: "Run the build and let me know when it's done"
1. Run: npm run build (via Bash)
2. Loop: vu_diff(threshold=0.05)
3. When changed: vu_describe(prompt="What is the build status? Did it succeed or fail?")
4. Report result
```
### 3. Fill a Form
```
User: "Fill out the registration form with test data"
1. vu_describe(prompt="What form fields are visible?")
2. vu_click(x=field_x, y=field_y) # Click first field
3. vu_type(text="testuser@example.com")
4. vu_hotkey(keys="tab") # Move to next field
5. vu_type(text="Test User")
6. Continue for each field...
7. vu_click(x=submit_x, y=submit_y) # Submit
8. vu_describe(prompt="Was the form submitted successfully?")
```
### 4. Navigate UI
```
User: "Open the settings panel in VS Code"
1. vu_list_windows(filter_app="Code")
2. vu_window(title="Code") # Capture VS Code
3. vu_hotkey(keys="cmd+,") # Open settings
4. vu_diff() # Wait for settings to open
5. vu_describe(prompt="Are the VS Code settings now visible?")
```
### 5. Monitor for Changes
```
User: "Watch the dashboard and tell me when the status changes"
1. vu() # Initial capture
2. Loop every 5 seconds:
- vu_diff(threshold=0.02)
- If changed: vu_describe(prompt="What changed on the dashboard?")
3. Report changes to user
```
## Best Practices
### 1. Capture Before Acting
Always capture and analyze before clicking/typing:
```
ā Bad: vu_click(x=500, y=300) # Hope it's the right spot
ā
Good:
1. vu_describe(prompt="Where is the submit button?")
2. Parse coordinates from description
3. vu_click(x=parsed_x, y=parsed_y)
```
### 2. Use Appropriate Capture Scope
```
Full screen ā General context, multi-app workflows
Window ā Single app focus, cleaner output
Region ā Specific element, faster/smaller
```
### 3. Specific Vision Prompts
```
ā Vague: "What do you see?"
ā
Specific:
- "Is there an error message visible? If so, what does it say?"
- "What is the status of the test runner? Passing, failing, or running?"
- "List all form fields visible and their current values"
```
### 4. Rate Limit Automation
Don't spam clicks. Wait between actions:
```
vu_click(x=100, y=100)
# Wait for UI response
vu_diff(threshold=0.01)
vu_click(x=200, y=200)
```
### 5. Verify After Actions
Always verify automation succeeded:
```
vu_type(text="hello@example.com")
vu_describe(prompt="What text is now in the email field?")
```
## Troubleshooting
### "Permission denied" or blank captures
ā Grant Screen Recording permission: System Settings > Privacy > Screen Recording
### Automation not working
ā Grant Accessibility permission: System Settings > Privacy > Accessibility
### Vision analysis failing
ā Check API keys in environment variables
ā Try different provider: `vu_describe(provider="openai")`
### Coordinates seem off
ā Retina display? Coordinates are in logical points, not pixels
ā Multi-monitor? Check `vu_list_monitors()` for display arrangement
### Hotkey blocked
ā Some dangerous hotkeys (cmd+q) are blocked for safety
ā Unblock in config if absolutely necessary
## Configuration
Environment variables:
```bash
OMNI_VU_ANTHROPIC_API_KEY=sk-ant-... # For Claude vision
OMNI_VU_OPENAI_API_KEY=sk-... # For GPT-4 vision
OMNI_VU_DEFAULT_PROVIDER=claude # Default AI provider
OMNI_VU_SAFETY_LEVEL=medium # low/medium/high
```
History stored at: `~/.omni.vu/captures/`Signals
Information
- Repository
- eddiebe147/claude-settings
- Author
- eddiebe147
- Last Sync
- 2/4/2026
- Repo Updated
- 2/4/2026
- Created
- 1/13/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
mem0
Integrate Mem0 Platform into AI applications for persistent memory, personalization, and semantic search. Use this skill when the user mentions "mem0", "memory layer", "remember user preferences", "persistent context", "personalization", or needs to add long-term memory to chatbots, agents, or AI apps. Covers Python and TypeScript SDKs, framework integrations (LangChain, CrewAI, Vercel AI SDK, OpenAI Agents SDK, Pipecat), and the full Platform API. Use even when the user doesn't explicitly say "mem0" but describes needing conversation memory, user context retention, or knowledge retrieval across sessions.
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
browser-use
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots, or extract information from web pages.
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.