Media
Geek-skills-mineru-pdf-parser - Claude MCP Skill
PDF解析工具,将复杂PDF文档转换为LLM友好的Markdown/JSON格式。适用于:(1) 将PDF转换为Markdown或JSON格式,(2) 提取PDF中的文本、表格、公式、图像,(3) 处理学术论文、技术文档、商业报告的PDF解析,(4) 为RAG应用准备高质量文档数据,(5) 批量处理PDF文件。触发关键词包括:"PDF解析"、"PDF转Markdown"、"PDF转JSON"、"提取PDF表格"、"提取PDF公式"、"MinerU"、"文档解析"、"PDF extraction"、"convert PDF"、"parse PDF"等。
SEO Guide: Enhance your AI agent with the Geek-skills-mineru-pdf-parser tool. This Model Context Protocol (MCP) server allows Claude Desktop and other LLMs to pdf解析工具,将复杂pdf文档转换为llm友好的markdown/json格式。适用于:(1) 将pdf转换为markdown或json格式,(2) 提取pdf中的文本、表格、公式、图像,(3) 处... Download and configure this skill to unlock new capabilities for your AI workflow.
Documentation
SKILL.md# MinerU PDF Parser
将复杂PDF文档转换为机器可读的Markdown/JSON格式,适用于LLM和RAG应用。
## 安装
```bash
# 推荐使用uv安装
pip install uv
uv pip install -U "mineru[all]"
# 下载模型(首次使用)
mineru-models-download
```
## 快速使用
### 命令行
```bash
# 解析单个PDF
mineru -p input.pdf -o output_dir
# 批量解析
mineru -p pdf_folder/ -o output_dir
# 指定解析模式
mineru -p input.pdf -o output_dir --backend vlm # VLM模式(高精度)
mineru -p input.pdf -o output_dir --backend pipeline # Pipeline模式(快速)
mineru -p input.pdf -o output_dir --backend hybrid # 混合模式(平衡)
```
### Python API
```python
from mineru import MinerU
mineru = MinerU()
result = mineru.parse("document.pdf")
# 获取输出
markdown = result.to_markdown()
json_data = result.to_json()
```
详细API见 [references/api_reference.md](references/api_reference.md)
## 解析模式选择
| 模式 | 特点 | 适用场景 |
|------|------|----------|
| `pipeline` | 快速、资源少 | 简单文档、纯文本PDF |
| `vlm` | 高精度、复杂布局 | 学术论文、公式表格文档 |
| `hybrid` | 平衡速度精度 | 通用场景 |
## 输出文件
- `{filename}.md` - Markdown正文
- `{filename}_content_list.json` - 结构化JSON
- `images/` - 提取的图像
- `{filename}_middle.json` - 中间结果(调试)
格式详情见 [references/output_formats.md](references/output_formats.md)
## 最佳实践
### 学术论文
```bash
mineru -p paper.pdf -o output --backend vlm
```
### 批量处理
```python
from mineru import MinerU
import os
mineru = MinerU(backend="hybrid")
for pdf in os.listdir("pdfs/"):
if pdf.endswith(".pdf"):
result = mineru.parse(f"pdfs/{pdf}")
result.save(f"output/{pdf[:-4]}/")
```
### RAG数据准备
```python
sections = result.get_sections()
for section in sections:
vector_db.add(section.title, section.content)
```
### 启用GPU加速
修改 `~/.mineru.json` 中 `device-mode` 为 `cuda`。
更多见 [references/best_practices.md](references/best_practices.md)
## 脚本
使用 `scripts/mineru_parse.py` 进行解析,支持错误处理和日志记录。Signals
Information
- Repository
- staruhub/ClaudeSkills
- Author
- staruhub
- Last Sync
- 5/10/2026
- Repo Updated
- 5/9/2026
- Created
- 3/26/2026
Reviews (0)
No reviews yet. Be the first to review this skill!
Related Skills
upgrade-nodejs
Upgrading Bun's Self-Reported Node.js Version
cursorrules
CrewAI Development Rules
cn-check
Install and run the Continue CLI (`cn`) to execute AI agent checks on local code changes. Use when asked to "run checks", "lint with AI", "review my changes with cn", or set up Continue CI locally.
CLAUDE
CLAUDE.md
Related Guides
Bear Notes Claude Skill: Your AI-Powered Note-Taking Assistant
Learn how to use the bear-notes Claude skill. Complete guide with installation instructions and examples.
Mastering tmux with Claude: A Complete Guide to the tmux Claude Skill
Learn how to use the tmux Claude skill. Complete guide with installation instructions and examples.
OpenAI Whisper API Claude Skill: Complete Guide to AI-Powered Audio Transcription
Learn how to use the openai-whisper-api Claude skill. Complete guide with installation instructions and examples.