Development

scaffold-connector - Claude MCP Skill

Build a new OpenMetadata connector from scratch — scaffold JSON Schema, Python boilerplate, and AI context using schema-first architecture with code generation across Python, Java, TypeScript, and auto-rendered UI forms.

SEO Guide: Enhance your AI agent with the scaffold-connector tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to build a new openmetadata connector from scratch — scaffold json schema, python boilerplate, and ai c... Download and configure this skill to unlock new capabilities for your AI workflow.

🌟556 stars • 1670 forks
📥0 downloads

Documentation

SKILL.md
# OpenMetadata Connector Building Skill

## When to Activate

When a user asks to build, create, add, or scaffold a new connector, source, or integration for OpenMetadata.

## Core Insight

**One JSON Schema definition cascades through 6 layers**: Python Pydantic models, Java models, UI forms (RJSF auto-render), API validation, test fixtures, and documentation. Define the schema once — everything else is generated or guided.

## Workflow: 7 Phases

### Phase 0: ENVIRONMENT — Set Up Python Dev Environment

Before any `make` or `python` commands, set up the environment from the repo root:

```bash
python3.11 -m venv env
source env/bin/activate
make install_dev generate
```

Always activate before running commands: `source env/bin/activate`

### Phase 1: SCAFFOLD — Generate Boilerplate

Run the scaffold CLI to collect inputs and generate files:

```bash
source env/bin/activate
metadata scaffold-connector
```

Interactive mode collects: connector name, service type, connection type, auth types, capabilities, docs URL, SDK package, API endpoints, implementation notes, Docker image, container port.

Non-interactive mode:
```bash
metadata scaffold-connector \
  --name my_db \
  --service-type database \
  --connection-type sqlalchemy \
  --scheme "mydb+pymydb" \
  --auth-types basic \
  --capabilities metadata lineage usage profiler \
  --docs-url "https://docs.example.com/api" \
  --sdk-package "mydb-sdk" \
  --docker-image "mydb/mydb:latest" \
  --docker-port 5432
```

**Output**: JSON Schema + test connection JSON + Python files + `CONNECTOR_CONTEXT.md` as an AI working document. SQLAlchemy database connectors get concrete code templates; all others get skeleton files with pointers to reference connectors.

**CONNECTOR_CONTEXT.md handling**: The scaffold generates `CONNECTOR_CONTEXT.md` in the connector directory as a working document for any AI tool (Claude Code, Cursor, Codex, Copilot, Windsurf). It is **gitignored** — it stays local and is never committed to the repo. No cleanup needed.

### Phase 2: CLASSIFY — Understand the Source

The scaffold classifies along 3 dimensions. Verify the choices:

**Dimension 1 — Service Type** (determines directory + base class):

| Service Type | Base Class | Reference |
|---|---|---|
| `database` | `CommonDbSourceService` | `mysql/` |
| `dashboard` | `DashboardServiceSource` | `metabase/` |
| `pipeline` | `PipelineServiceSource` | `airflow/` |
| `messaging` | `MessagingServiceSource` | `kafka/` |
| `mlmodel` | `MlModelServiceSource` | `mlflow/` |
| `storage` | `StorageServiceSource` | `s3/` |
| `search` | `SearchServiceSource` | `elasticsearch/` |
| `api` | `ApiServiceSource` | `rest/` |

**Dimension 2 — Connection Type** (database only):
- `sqlalchemy` → `BaseConnection[Config, Engine]` + SQLAlchemy dialect
- `rest_api` → `get_connection()` + custom REST client (ref: `salesforce/`)
- `sdk_client` → `get_connection()` + vendor SDK wrapper

**Dimension 3 — Capabilities** (determines extra files):
`metadata` (always), `lineage`, `usage`, `profiler`, `stored_procedures`, `data_diff`

Read the source-type-specific standard at `${CLAUDE_SKILL_DIR}/standards/source_types/{service_type}.md` for detailed patterns.

### Phase 3: RESEARCH — API/SDK Discovery

Read the `CONNECTOR_CONTEXT.md` generated by the scaffold. Then research the source's API/SDK.

**If you can dispatch sub-agents** (Claude Code): Launch a `connector-researcher` agent:
```
Agent: openmetadata-skills:connector-researcher
Prompt: "Research {source_name} for an OpenMetadata {service_type} connector.
Find: API docs, auth methods, key endpoints, pagination, rate limits, SDK packages."
```

**If you cannot dispatch sub-agents**: Perform the research yourself using WebSearch and WebFetch.

### Phase 4: IMPLEMENT — Fill in the TODO Items

The scaffold generates files with `# TODO` markers. Read the relevant standards before implementing:
- `${CLAUDE_SKILL_DIR}/standards/connection.md` — Connection patterns
- `${CLAUDE_SKILL_DIR}/standards/patterns.md` — Error handling, pagination, auth
- `${CLAUDE_SKILL_DIR}/standards/performance.md` — Pagination, lookup optimization, anti-patterns
- `${CLAUDE_SKILL_DIR}/standards/memory.md` — Memory management, streaming, OOM prevention
- `${CLAUDE_SKILL_DIR}/standards/source_types/{service_type}.md` — Service-specific patterns

**SQLAlchemy database**: Templates are mostly complete. Customize `_get_client()` if needed.
**Non-SQLAlchemy**: Study the reference connector, then implement each skeleton file.

**Critical for JSON Schema**:
- Make auth fields (`username`, `password`, `token`) **required** when the service needs authentication by default. If omitting a field means an opaque 401 at runtime, make it required so the UI validates upfront.
- Include SSL/TLS config (`verifySSL` + `sslConfig` `$ref`) for any connector that communicates over HTTPS — enterprise deployments use internal CAs.
- **SSL must be wired end-to-end**: schema → `connection.py` (resolve with `get_verify_ssl_fn`) → `client.py` (`session.verify = verify_ssl`). Missing wiring triggers SonarQube Security Review failure.
- See `${CLAUDE_SKILL_DIR}/standards/schema.md` for the `$ref` patterns and required fields guidance.

**Critical for Pydantic API models (models.py)**:
- Always set `model_config = ConfigDict(populate_by_name=True)` when using `Field(alias=...)` — without this, constructing instances with Python attribute names raises `ValidationError`.
- See `${CLAUDE_SKILL_DIR}/standards/code_style.md` for the full pattern.

**Critical for non-database connectors (client.py)**:
- Every list endpoint MUST implement pagination if the API supports it. Check the API docs.
- Missing pagination causes silent data loss — only the first page is ingested.
- Build dicts for repeated lookups (e.g., folder path → folder name) instead of iterating lists.
- See `${CLAUDE_SKILL_DIR}/standards/performance.md` for correct patterns and anti-patterns.

**Critical for storage connectors and any connector that reads files**:
- Never `.read()` entire files without a size check — causes OOM on production instances.
- Use framework streaming readers (`metadata/readers/dataframe/`) for data files.
- `del` large objects after processing and call `gc.collect()`.
- See `${CLAUDE_SKILL_DIR}/standards/memory.md` for correct patterns.

**Critical for lineage**:
- Never use wildcard `table_name="*"` in search queries — this links every table in a database to each entity, producing incorrect lineage.
- If the source doesn't provide table-level info, skip lineage and document the limitation.
- See `${CLAUDE_SKILL_DIR}/standards/lineage.md` for correct patterns.

### Phase 5: REGISTER — Integration Points

Read `${CLAUDE_SKILL_DIR}/standards/registration.md` for detailed instructions. Summary:

| Step | File | Change |
|------|------|--------|
| 1 | `openmetadata-spec/.../entity/services/{serviceType}Service.json` | Add to type enum + connection oneOf |
| 2 | `openmetadata-ui/.../utils/{ServiceType}ServiceUtils.tsx` | Import schema + add switch case |
| 3 | `openmetadata-ui/.../locale/languages/` | Add i18n display name keys |

### Phase 6: GENERATE & FORMAT — Run Code Generation and Formatting

This step is **mandatory** — always run it before committing. Ensure the Python environment is set up:

```bash
# Ensure environment is active and tools are installed
source env/bin/activate
pip install -e ".[dev]" 2>/dev/null || make install_dev

# Generate models from schemas
make generate                                # Python Pydantic models
mvn clean install -pl openmetadata-spec      # Java models
cd openmetadata-ui/src/main/resources/ui && yarn parse-schema  # UI schemas

# Format ALL code (mandatory before commit)
cd /path/to/repo/root
make py_format                               # black + isort + pycln
mvn spotless:apply                           # Format Java
```

**If `make py_format` fails**: The most common cause is missing dev dependencies. Run `make install_dev` first, then retry.

**Never skip formatting** — unformatted code will fail CI.

### Phase 7: VALIDATE — Run Static Analysis and Checklist

Run the static analyzer as a self-check before submitting:
```bash
python skills/connector-review/scripts/analyze_connector.py {service_type} {name}
```

Fix any issues it reports. Then verify the full checklist:

```
[ ] JSON Schema: validates, $ref resolves, supports* flags correct
[ ] JSON Schema: auth fields required when service mandates authentication
[ ] JSON Schema: SSL/TLS config included for HTTPS connectors
[ ] Code gen: make generate + mvn install + yarn parse-schema succeed
[ ] Connection: creates client, test_connection passes all steps
[ ] Source: create() validates config type, ServiceSpec is discoverable
[ ] Pydantic models: populate_by_name=True on all aliased models
[ ] Client: all list endpoints paginate (check API docs for pagination support)
[ ] Client: dict lookups in prepare(), not list iteration per entity
[ ] Lineage: no wildcard table_name="*" — skip if no table-level info available
[ ] Tests: unit + connection integration + metadata integration pass (no empty stubs)
[ ] Formatting: make py_format + mvn spotless:apply pass with no changes
[ ] Cleanup: CONNECTOR_CONTEXT.md is gitignored (verify it's not staged)
[ ] Cleanup: no leftover TODO scaffolding comments
```

### Phase 8: TEST LOCALLY — Deploy and Test in the UI

Build everything and bring up a full local OpenMetadata stack with Docker:

**Full build** (first time or after Java/UI changes):
```bash
./docker/run_local_docker.sh -m ui -d mysql -s false -i true -r true
```

**Fast rebuild** (ingestion-only changes, ~2-3 minutes):
```bash
./docker/run_local_docker.sh -m ui -d mysql -s true -i true -r false
```

Once services are up (~3-5 minutes):
1. Open **http://localhost:8585**
2. Go to **Settings → Services → {Your Service Type}**
3. Click **Add New Service** and select your connector
4. Configure connection details and click **Test Connection**
5. If test passes, run metadata ingestion to verify entities are created

Other service URLs:
- Airflow: http://localhost:8080 (admin / admin)
- Elasticsearch: http://localhost:9200

**Tear down**: `cd docker/development && docker compose down -v`

**Troubleshooting**:
- Connector not in dropdown → check service schema registration, rebuild without `-s true`
- Test connection fails → check `test_fn` keys match test connection JSON step names
- Container logs: `docker compose -f docker/development/docker-compose.yml logs ingestion`

### Phase 9: CREATE PR — Submit with Quality Summary

When creating a PR for the connector, include the review summary in the PR description so reviewers see the quality assessment upfront:

```bash
# Run the static analyzer
analysis=$(python skills/connector-review/scripts/analyze_connector.py {service_type} {name} --json)

# Create PR with quality summary in description
gh pr create --title "feat(ingestion): Add {Name} {service_type} connector" --body "$(cat <<'EOF'
## Summary
- New {service_type} connector for {Name}
- Capabilities: {list capabilities}

## Test plan
- [ ] Unit tests pass (`pytest ingestion/tests/unit/topology/{service_type}/test_{name}.py`)
- [ ] Integration tests pass
- [ ] Local Docker test: connector appears in UI, test connection passes

## Connector Quality Review

**Verdict**: {VERDICT} | **Score**: {SCORE}/10

| Category | Score |
|----------|-------|
| Schema & Registration | X/10 |
| Connection & Auth | X/10 |
| Source, Topology & Performance | X/10 |
| Test Quality | X/10 |
| Code Quality & Style | X/10 |

**Blockers**: 0 | **Warnings**: {count} | **Suggestions**: {count}

<details>
<summary>Static analysis output</summary>

{paste analyze_connector.py output here}

</details>

🤖 Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
```

The quality summary gives maintainers confidence about the connector's state without needing to review every file manually.

## Standards Reference

All standards are in `${CLAUDE_SKILL_DIR}/standards/`:

| Standard | Content |
|----------|---------|
| `main.md` | Architecture overview, connector anatomy, service types |
| `patterns.md` | Error handling, logging, pagination, auth, filters |
| `testing.md` | Unit test patterns, integration tests, pytest style |
| `code_style.md` | Python style, JSON Schema conventions, naming |
| `schema.md` | Connection schema patterns, $ref usage, test connection JSON |
| `connection.md` | BaseConnection vs function patterns, SSL, client wrapper |
| `service_spec.md` | DefaultDatabaseSpec vs BaseSpec |
| `registration.md` | Service enum, UI utils, i18n |
| `performance.md` | Pagination, batching, rate limiting |
| `memory.md` | Memory management, streaming, OOM prevention |
| `lineage.md` | Lineage extraction methods, dialect mapping, query logs |
| `sql.md` | SQLAlchemy patterns, URL building, auth, multi-DB |
| `source_types/*.md` | Service-type-specific patterns |

## References

Architecture guides in `${CLAUDE_SKILL_DIR}/references/`:

| Reference | Content |
|-----------|---------|
| `architecture-decision-tree.md` | Service type, connection type, base class selection |
| `connection-type-guide.md` | SQLAlchemy vs REST API vs SDK client |
| `capability-mapping.md` | Capabilities by service type, schema flags, generated files |

Signals

Avg rating0.0
Reviews0
Favorites0

Information

Repository
open-metadata/OpenMetadata
Author
open-metadata
Last Sync
3/12/2026
Repo Updated
3/12/2026
Created
3/9/2026

Reviews (0)

No reviews yet. Be the first to review this skill!