Development
scaffold-connector - Claude MCP Skill
Build a new OpenMetadata connector from scratch — scaffold JSON Schema, Python boilerplate, and AI context using schema-first architecture with code generation across Python, Java, TypeScript, and auto-rendered UI forms.
SEO Guide: Enhance your AI agent with the scaffold-connector tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to build a new openmetadata connector from scratch — scaffold json schema, python boilerplate, and ai c... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# OpenMetadata Connector Building Skill
## When to Activate
When a user asks to build, create, add, or scaffold a new connector, source, or integration for OpenMetadata.
## Core Insight
**One JSON Schema definition cascades through 6 layers**: Python Pydantic models, Java models, UI forms (RJSF auto-render), API validation, test fixtures, and documentation. Define the schema once — everything else is generated or guided.
## Workflow: 7 Phases
### Phase 0: ENVIRONMENT — Set Up Python Dev Environment
Before any `make` or `python` commands, set up the environment from the repo root:
```bash
python3.11 -m venv env
source env/bin/activate
make install_dev generate
```
Always activate before running commands: `source env/bin/activate`
### Phase 1: SCAFFOLD — Generate Boilerplate
Run the scaffold CLI to collect inputs and generate files:
```bash
source env/bin/activate
metadata scaffold-connector
```
Interactive mode collects: connector name, service type, connection type, auth types, capabilities, docs URL, SDK package, API endpoints, implementation notes, Docker image, container port.
Non-interactive mode:
```bash
metadata scaffold-connector \
--name my_db \
--service-type database \
--connection-type sqlalchemy \
--scheme "mydb+pymydb" \
--auth-types basic \
--capabilities metadata lineage usage profiler \
--docs-url "https://docs.example.com/api" \
--sdk-package "mydb-sdk" \
--docker-image "mydb/mydb:latest" \
--docker-port 5432
```
**Output**: JSON Schema + test connection JSON + Python files + `CONNECTOR_CONTEXT.md` as an AI working document. SQLAlchemy database connectors get concrete code templates; all others get skeleton files with pointers to reference connectors.
**CONNECTOR_CONTEXT.md handling**: The scaffold generates `CONNECTOR_CONTEXT.md` in the connector directory as a working document for any AI tool (Claude Code, Cursor, Codex, Copilot, Windsurf). It is **gitignored** — it stays local and is never committed to the repo. No cleanup needed.
### Phase 2: CLASSIFY — Understand the Source
The scaffold classifies along 3 dimensions. Verify the choices:
**Dimension 1 — Service Type** (determines directory + base class):
| Service Type | Base Class | Reference |
|---|---|---|
| `database` | `CommonDbSourceService` | `mysql/` |
| `dashboard` | `DashboardServiceSource` | `metabase/` |
| `pipeline` | `PipelineServiceSource` | `airflow/` |
| `messaging` | `MessagingServiceSource` | `kafka/` |
| `mlmodel` | `MlModelServiceSource` | `mlflow/` |
| `storage` | `StorageServiceSource` | `s3/` |
| `search` | `SearchServiceSource` | `elasticsearch/` |
| `api` | `ApiServiceSource` | `rest/` |
**Dimension 2 — Connection Type** (database only):
- `sqlalchemy` → `BaseConnection[Config, Engine]` + SQLAlchemy dialect
- `rest_api` → `get_connection()` + custom REST client (ref: `salesforce/`)
- `sdk_client` → `get_connection()` + vendor SDK wrapper
**Dimension 3 — Capabilities** (determines extra files):
`metadata` (always), `lineage`, `usage`, `profiler`, `stored_procedures`, `data_diff`
Read the source-type-specific standard at `${CLAUDE_SKILL_DIR}/standards/source_types/{service_type}.md` for detailed patterns.
### Phase 3: RESEARCH — API/SDK Discovery
Read the `CONNECTOR_CONTEXT.md` generated by the scaffold. Then research the source's API/SDK.
**If you can dispatch sub-agents** (Claude Code): Launch a `connector-researcher` agent:
```
Agent: openmetadata-skills:connector-researcher
Prompt: "Research {source_name} for an OpenMetadata {service_type} connector.
Find: API docs, auth methods, key endpoints, pagination, rate limits, SDK packages."
```
**If you cannot dispatch sub-agents**: Perform the research yourself using WebSearch and WebFetch.
### Phase 4: IMPLEMENT — Fill in the TODO Items
The scaffold generates files with `# TODO` markers. Read the relevant standards before implementing:
- `${CLAUDE_SKILL_DIR}/standards/connection.md` — Connection patterns
- `${CLAUDE_SKILL_DIR}/standards/patterns.md` — Error handling, pagination, auth
- `${CLAUDE_SKILL_DIR}/standards/performance.md` — Pagination, lookup optimization, anti-patterns
- `${CLAUDE_SKILL_DIR}/standards/memory.md` — Memory management, streaming, OOM prevention
- `${CLAUDE_SKILL_DIR}/standards/source_types/{service_type}.md` — Service-specific patterns
**SQLAlchemy database**: Templates are mostly complete. Customize `_get_client()` if needed.
**Non-SQLAlchemy**: Study the reference connector, then implement each skeleton file.
**Critical for JSON Schema**:
- Make auth fields (`username`, `password`, `token`) **required** when the service needs authentication by default. If omitting a field means an opaque 401 at runtime, make it required so the UI validates upfront.
- Include SSL/TLS config (`verifySSL` + `sslConfig` `$ref`) for any connector that communicates over HTTPS — enterprise deployments use internal CAs.
- **SSL must be wired end-to-end**: schema → `connection.py` (resolve with `get_verify_ssl_fn`) → `client.py` (`session.verify = verify_ssl`). Missing wiring triggers SonarQube Security Review failure.
- See `${CLAUDE_SKILL_DIR}/standards/schema.md` for the `$ref` patterns and required fields guidance.
**Critical for Pydantic API models (models.py)**:
- Always set `model_config = ConfigDict(populate_by_name=True)` when using `Field(alias=...)` — without this, constructing instances with Python attribute names raises `ValidationError`.
- See `${CLAUDE_SKILL_DIR}/standards/code_style.md` for the full pattern.
**Critical for non-database connectors (client.py)**:
- Every list endpoint MUST implement pagination if the API supports it. Check the API docs.
- Missing pagination causes silent data loss — only the first page is ingested.
- Build dicts for repeated lookups (e.g., folder path → folder name) instead of iterating lists.
- See `${CLAUDE_SKILL_DIR}/standards/performance.md` for correct patterns and anti-patterns.
**Critical for storage connectors and any connector that reads files**:
- Never `.read()` entire files without a size check — causes OOM on production instances.
- Use framework streaming readers (`metadata/readers/dataframe/`) for data files.
- `del` large objects after processing and call `gc.collect()`.
- See `${CLAUDE_SKILL_DIR}/standards/memory.md` for correct patterns.
**Critical for lineage**:
- Never use wildcard `table_name="*"` in search queries — this links every table in a database to each entity, producing incorrect lineage.
- If the source doesn't provide table-level info, skip lineage and document the limitation.
- See `${CLAUDE_SKILL_DIR}/standards/lineage.md` for correct patterns.
### Phase 5: REGISTER — Integration Points
Read `${CLAUDE_SKILL_DIR}/standards/registration.md` for detailed instructions. Summary:
| Step | File | Change |
|------|------|--------|
| 1 | `openmetadata-spec/.../entity/services/{serviceType}Service.json` | Add to type enum + connection oneOf |
| 2 | `openmetadata-ui/.../utils/{ServiceType}ServiceUtils.tsx` | Import schema + add switch case |
| 3 | `openmetadata-ui/.../locale/languages/` | Add i18n display name keys |
### Phase 6: GENERATE & FORMAT — Run Code Generation and Formatting
This step is **mandatory** — always run it before committing. Ensure the Python environment is set up:
```bash
# Ensure environment is active and tools are installed
source env/bin/activate
pip install -e ".[dev]" 2>/dev/null || make install_dev
# Generate models from schemas
make generate # Python Pydantic models
mvn clean install -pl openmetadata-spec # Java models
cd openmetadata-ui/src/main/resources/ui && yarn parse-schema # UI schemas
# Format ALL code (mandatory before commit)
cd /path/to/repo/root
make py_format # black + isort + pycln
mvn spotless:apply # Format Java
```
**If `make py_format` fails**: The most common cause is missing dev dependencies. Run `make install_dev` first, then retry.
**Never skip formatting** — unformatted code will fail CI.
### Phase 7: VALIDATE — Run Static Analysis and Checklist
Run the static analyzer as a self-check before submitting:
```bash
python skills/connector-review/scripts/analyze_connector.py {service_type} {name}
```
Fix any issues it reports. Then verify the full checklist:
```
[ ] JSON Schema: validates, $ref resolves, supports* flags correct
[ ] JSON Schema: auth fields required when service mandates authentication
[ ] JSON Schema: SSL/TLS config included for HTTPS connectors
[ ] Code gen: make generate + mvn install + yarn parse-schema succeed
[ ] Connection: creates client, test_connection passes all steps
[ ] Source: create() validates config type, ServiceSpec is discoverable
[ ] Pydantic models: populate_by_name=True on all aliased models
[ ] Client: all list endpoints paginate (check API docs for pagination support)
[ ] Client: dict lookups in prepare(), not list iteration per entity
[ ] Lineage: no wildcard table_name="*" — skip if no table-level info available
[ ] Tests: unit + connection integration + metadata integration pass (no empty stubs)
[ ] Formatting: make py_format + mvn spotless:apply pass with no changes
[ ] Cleanup: CONNECTOR_CONTEXT.md is gitignored (verify it's not staged)
[ ] Cleanup: no leftover TODO scaffolding comments
```
### Phase 8: TEST LOCALLY — Deploy and Test in the UI
Build everything and bring up a full local OpenMetadata stack with Docker:
**Full build** (first time or after Java/UI changes):
```bash
./docker/run_local_docker.sh -m ui -d mysql -s false -i true -r true
```
**Fast rebuild** (ingestion-only changes, ~2-3 minutes):
```bash
./docker/run_local_docker.sh -m ui -d mysql -s true -i true -r false
```
Once services are up (~3-5 minutes):
1. Open **http://localhost:8585**
2. Go to **Settings → Services → {Your Service Type}**
3. Click **Add New Service** and select your connector
4. Configure connection details and click **Test Connection**
5. If test passes, run metadata ingestion to verify entities are created
Other service URLs:
- Airflow: http://localhost:8080 (admin / admin)
- Elasticsearch: http://localhost:9200
**Tear down**: `cd docker/development && docker compose down -v`
**Troubleshooting**:
- Connector not in dropdown → check service schema registration, rebuild without `-s true`
- Test connection fails → check `test_fn` keys match test connection JSON step names
- Container logs: `docker compose -f docker/development/docker-compose.yml logs ingestion`
### Phase 9: CREATE PR — Submit with Quality Summary
When creating a PR for the connector, include the review summary in the PR description so reviewers see the quality assessment upfront:
```bash
# Run the static analyzer
analysis=$(python skills/connector-review/scripts/analyze_connector.py {service_type} {name} --json)
# Create PR with quality summary in description
gh pr create --title "feat(ingestion): Add {Name} {service_type} connector" --body "$(cat <<'EOF'
## Summary
- New {service_type} connector for {Name}
- Capabilities: {list capabilities}
## Test plan
- [ ] Unit tests pass (`pytest ingestion/tests/unit/topology/{service_type}/test_{name}.py`)
- [ ] Integration tests pass
- [ ] Local Docker test: connector appears in UI, test connection passes
## Connector Quality Review
**Verdict**: {VERDICT} | **Score**: {SCORE}/10
| Category | Score |
|----------|-------|
| Schema & Registration | X/10 |
| Connection & Auth | X/10 |
| Source, Topology & Performance | X/10 |
| Test Quality | X/10 |
| Code Quality & Style | X/10 |
**Blockers**: 0 | **Warnings**: {count} | **Suggestions**: {count}
<details>
<summary>Static analysis output</summary>
{paste analyze_connector.py output here}
</details>
🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
```
The quality summary gives maintainers confidence about the connector's state without needing to review every file manually.
## Standards Reference
All standards are in `${CLAUDE_SKILL_DIR}/standards/`:
| Standard | Content |
|----------|---------|
| `main.md` | Architecture overview, connector anatomy, service types |
| `patterns.md` | Error handling, logging, pagination, auth, filters |
| `testing.md` | Unit test patterns, integration tests, pytest style |
| `code_style.md` | Python style, JSON Schema conventions, naming |
| `schema.md` | Connection schema patterns, $ref usage, test connection JSON |
| `connection.md` | BaseConnection vs function patterns, SSL, client wrapper |
| `service_spec.md` | DefaultDatabaseSpec vs BaseSpec |
| `registration.md` | Service enum, UI utils, i18n |
| `performance.md` | Pagination, batching, rate limiting |
| `memory.md` | Memory management, streaming, OOM prevention |
| `lineage.md` | Lineage extraction methods, dialect mapping, query logs |
| `sql.md` | SQLAlchemy patterns, URL building, auth, multi-DB |
| `source_types/*.md` | Service-type-specific patterns |
## References
Architecture guides in `${CLAUDE_SKILL_DIR}/references/`:
| Reference | Content |
|-----------|---------|
| `architecture-decision-tree.md` | Service type, connection type, base class selection |
| `connection-type-guide.md` | SQLAlchemy vs REST API vs SDK client |
| `capability-mapping.md` | Capabilities by service type, schema flags, generated files |Signals
Information
- Repository
- open-metadata/OpenMetadata
- Author
- open-metadata
- Last Sync
- 3/12/2026
- Repo Updated
- 3/12/2026
- Created
- 3/9/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.