General
ai-vision - Claude MCP Skill
Analyze images and videos with AI vision models. Detect objects with bounding boxes, compare multiple images, extract design tokens and visual hierarchy, and analyze video content using Google Gemini or Vertex AI. Supports CLI and MCP modes.
SEO Guide: Enhance your AI agent with the ai-vision tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to analyze images and videos with ai vision models. detect objects with bounding boxes, compare multipl... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# AI Vision MCP
AI-powered image and video analysis CLI using Google Gemini and Vertex AI models. Analyze images, compare multiple images, detect objects, and analyze videos with advanced AI capabilities.
## Quick Start
### Installation
```bash
npm install -g ai-vision-mcp
```
Or use directly with npx:
```bash
npx ai-vision-mcp <command> [options]
```
### Setup
Set up your provider credentials:
**Google AI Studio (Recommended)**
```bash
export IMAGE_PROVIDER="google"
export VIDEO_PROVIDER="google"
export GEMINI_API_KEY="your-gemini-api-key"
```
Get your API key at [aistudio.google.com/app/api-keys](https://aistudio.google.com/app/api-keys)
**Vertex AI**
```bash
export IMAGE_PROVIDER="vertex_ai"
export VIDEO_PROVIDER="vertex_ai"
export VERTEX_CLIENT_EMAIL="your-service-account@project.iam.gserviceaccount.com"
export VERTEX_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n"
export VERTEX_PROJECT_ID="your-gcp-project-id"
export GCS_BUCKET_NAME="your-gcs-bucket"
```
## Commands
### analyze-image
Analyze an image with AI vision models. Supports multiple analysis modes for different use cases.
```bash
ai-vision analyze-image <source> --prompt <text> [--mode <mode>] [options]
```
**Modes:**
- `general` (default) - General image analysis
- `palette` - Extract design tokens (colors, spacing, typography)
- `hierarchy` - Analyze visual hierarchy and eye flow
- `components` - Catalog UI components and design system maturity
**Examples:**
```bash
# General analysis
ai-vision analyze-image https://example.com/image.jpg --prompt "describe the scene"
# Design token extraction
ai-vision analyze-image screenshot.png --prompt "extract design tokens" --mode palette
# Visual hierarchy analysis
ai-vision analyze-image ui-mockup.png --prompt "analyze layout" --mode hierarchy
# Component inventory
ai-vision analyze-image design-system.png --prompt "list components" --mode components
# Output as JSON
ai-vision analyze-image image.jpg --prompt "analyze" --json
```
### compare-images
Compare 2-4 images side-by-side to identify differences, similarities, or changes.
```bash
ai-vision compare-images <source1> <source2> [source3] [source4] --prompt <text> [options]
```
**Examples:**
```bash
# Compare two versions
ai-vision compare-images before.jpg after.jpg --prompt "what changed?"
# Compare multiple designs
ai-vision compare-images v1.png v2.png v3.png --prompt "which is best?"
# Visual regression testing
ai-vision compare-images baseline.png current.png --prompt "find visual bugs" --json
```
### detect-objects
Detect and identify objects in an image with bounding boxes and confidence scores.
```bash
ai-vision detect-objects <source> --prompt <text> [--output <path>] [options]
```
**Examples:**
```bash
# Detect objects with bounding boxes
ai-vision detect-objects photo.jpg --prompt "find all cars"
# Save annotated image with bounding boxes drawn
ai-vision detect-objects scene.jpg --prompt "detect people" --output annotated.jpg
# Get JSON output
ai-vision detect-objects image.jpg --prompt "find text" --json
```
**Output Format:**
Returns:
- `detections`: Array of detected objects with bounding boxes and confidence scores
- `summary`: Human-readable text with CSS selectors for web elements and percentage coordinates
- `metadata`: Detection model, provider, processing time, and coordinate information
### analyze-video
Analyze video content frame-by-frame or as a whole.
```bash
ai-vision analyze-video <source> --prompt <text> [options]
```
**Examples:**
```bash
# Analyze video
ai-vision analyze-video recording.mp4 --prompt "describe what happens"
# Analyze Playwright recording
ai-vision analyze-video playwright-video.webm --prompt "detect interaction bugs"
# Get JSON output
ai-vision analyze-video video.mp4 --prompt "summarize" --json
```
## Global Options
```
--prompt <text> Analysis prompt (required for most commands)
--json Output raw JSON instead of formatted text
--temperature <num> Temperature 0-2 (default: 0.7)
--top-p <num> Top P 0-1
--top-k <num> Top K 1-100
--max-tokens <num> Max output tokens
--help Show help
```
## Input Sources
All commands accept multiple input formats:
- **URLs**: `https://example.com/image.jpg`
- **Local files**: `./path/to/image.jpg`
- **Base64 data**: `data:image/jpeg;base64,...`
- **GCS URIs** (Vertex AI): `gs://bucket/path/to/image.jpg`
## Use Cases
### Design System Analysis
Extract design tokens and component inventory from design mockups:
```bash
ai-vision analyze-image design-system.png --mode palette --prompt "extract all colors and spacing values"
ai-vision analyze-image components.png --mode components --prompt "catalog all UI components"
```
### Visual Regression Testing
Compare baseline and current screenshots to detect unintended changes:
```bash
ai-vision compare-images baseline.png current.png --prompt "identify visual differences"
```
### Content Moderation
Detect objects and analyze image content:
```bash
ai-vision detect-objects user-upload.jpg --prompt "find inappropriate content"
```
### Video Analysis
Analyze recorded interactions and detect bugs:
```bash
ai-vision analyze-video playwright-recording.webm --prompt "detect UI interaction bugs"
```
## Configuration
Configure default values via environment variables:
```bash
# Temperature settings
export TEMPERATURE=0.7
export TEMPERATURE_FOR_IMAGE=0.5
export TEMPERATURE_FOR_ANALYZE_IMAGE=0.3
# Token limits
export MAX_TOKENS=2048
export MAX_TOKENS_FOR_IMAGE=1024
export MAX_TOKENS_FOR_ANALYZE_IMAGE=512
# Sampling parameters
export TOP_P=0.9
export TOP_K=40
```
## Integration
Use as an MCP server in Claude Desktop, Claude Code, or other MCP clients:
```json
{
"mcpServers": {
"ai-vision-mcp": {
"command": "npx",
"args": ["ai-vision-mcp"],
"env": {
"IMAGE_PROVIDER": "google",
"GEMINI_API_KEY": "your-api-key"
}
}
}
}
```
## Resources
- [GitHub Repository](https://github.com/tan-yong-sheng/ai-vision-mcp)
- [MCP Documentation](https://modelcontextprotocol.io)
- [Google Gemini API](https://ai.google.dev)
- [Vertex AI Documentation](https://cloud.google.com/vertex-ai)
## License
MITSignals
Information
- Repository
- tan-yong-sheng/ai-vision-mcp
- Author
- tan-yong-sheng
- Last Sync
- 4/5/2026
- Repo Updated
- 4/3/2026
- Created
- 4/3/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.