Compare commits
17 Commits
v4.8.14-fi
...
v4.8.15-al
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
15dc7b220e | ||
|
|
c64d629a6a | ||
|
|
1a294c1fd3 | ||
|
|
a4f6128a89 | ||
|
|
021ec0595d | ||
|
|
6ceee7cb5e | ||
|
|
90d7d2a164 | ||
|
|
0c308fcf8b | ||
|
|
1aebe5f185 | ||
|
|
b188544386 | ||
|
|
7faa427e84 | ||
|
|
cb56d1e53e | ||
|
|
f28b7e41a8 | ||
|
|
c506442993 | ||
|
|
1cef206c13 | ||
|
|
d0e8c9c62e | ||
|
|
a4a8b7909c |
BIN
.github/imgs/image.png
vendored
|
Before Width: | Height: | Size: 21 KiB After Width: | Height: | Size: 15 KiB |
12
README.md
@@ -27,9 +27,6 @@ FastGPT 是一个基于 LLM 大语言模型的知识库问答系统,提供开
|
||||
<a href="/#-%E7%9B%B8%E5%85%B3%E9%A1%B9%E7%9B%AE">
|
||||
<img height="21" src="https://img.shields.io/badge/相关项目-7d09f1?style=flat-square" alt="project">
|
||||
</a>
|
||||
<a href="https://github.com/labring/FastGPT/blob/main/LICENSE">
|
||||
<img height="21" src="https://img.shields.io/badge/License-Apache--2.0-ffffff?style=flat-square&labelColor=d4eaf7&color=7d09f1" alt="license">
|
||||
</a>
|
||||
</p>
|
||||
|
||||
https://github.com/labring/FastGPT/assets/15308462/7d3a38df-eb0e-4388-9250-2409bd33f6d4
|
||||
@@ -118,7 +115,7 @@ https://github.com/labring/FastGPT/assets/15308462/7d3a38df-eb0e-4388-9250-2409b
|
||||
|
||||
## 🏘️ 社区交流群
|
||||
|
||||
扫码加入飞书话题群 (新开,逐渐弃用微信群):
|
||||
扫码加入飞书话题群:
|
||||
|
||||

|
||||
|
||||
@@ -126,6 +123,11 @@ https://github.com/labring/FastGPT/assets/15308462/7d3a38df-eb0e-4388-9250-2409b
|
||||
<img src="https://img.shields.io/badge/-返回顶部-7d09f1.svg" alt="#" align="right">
|
||||
</a>
|
||||
|
||||
## 🏘️ 加入我们
|
||||
|
||||
我们正在寻找志同道合的小伙伴,加速 FastGPT 的发展。你可以通过 [FastGPT 2025 招聘](https://fael3z0zfze.feishu.cn/wiki/P7FOwEmPziVcaYkvVaacnVX1nvg)了解 FastGPT 的招聘信息。
|
||||
|
||||
|
||||
## 💪 相关项目
|
||||
|
||||
- [Laf:3 分钟快速接入三方应用](https://github.com/labring/laf)
|
||||
@@ -171,7 +173,7 @@ https://github.com/labring/FastGPT/assets/15308462/7d3a38df-eb0e-4388-9250-2409b
|
||||
<picture>
|
||||
<source media="(prefers-color-scheme: dark)" srcset="https://next.ossinsight.io/widgets/official/compose-org-active-contributors/thumbnail.png?activity=active&period=past_28_days&owner_id=102226726&repo_ids=605673387&image_size=2x3&color_scheme=dark">
|
||||
<img alt="Active participants of labring - past 28 days" src="https://next.ossinsight.io/widgets/official/compose-org-active-contributors/thumbnail.png?activity=active&period=past_28_days&owner_id=102226726&repo_ids=605673387&image_size=2x3&color_scheme=light">
|
||||
</picture>
|
||||
</picture>****
|
||||
</td>
|
||||
<td rowspan="2">
|
||||
<picture>
|
||||
|
||||
|
Before Width: | Height: | Size: 381 KiB After Width: | Height: | Size: 323 KiB |
|
Before Width: | Height: | Size: 44 KiB After Width: | Height: | Size: 44 KiB |
|
Before Width: | Height: | Size: 78 KiB After Width: | Height: | Size: 70 KiB |
BIN
docSite/assets/imgs/api-dataset-1.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
|
Before Width: | Height: | Size: 101 KiB After Width: | Height: | Size: 96 KiB |
|
Before Width: | Height: | Size: 222 KiB After Width: | Height: | Size: 206 KiB |
|
Before Width: | Height: | Size: 146 KiB After Width: | Height: | Size: 136 KiB |
|
Before Width: | Height: | Size: 91 KiB After Width: | Height: | Size: 88 KiB |
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 43 KiB |
|
Before Width: | Height: | Size: 122 KiB After Width: | Height: | Size: 101 KiB |
|
Before Width: | Height: | Size: 62 KiB After Width: | Height: | Size: 53 KiB |
|
Before Width: | Height: | Size: 38 KiB After Width: | Height: | Size: 25 KiB |
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 19 KiB |
|
Before Width: | Height: | Size: 54 KiB After Width: | Height: | Size: 36 KiB |
|
Before Width: | Height: | Size: 71 KiB After Width: | Height: | Size: 67 KiB |
|
Before Width: | Height: | Size: 164 KiB After Width: | Height: | Size: 144 KiB |
|
Before Width: | Height: | Size: 152 KiB After Width: | Height: | Size: 138 KiB |
|
Before Width: | Height: | Size: 338 KiB After Width: | Height: | Size: 309 KiB |
|
Before Width: | Height: | Size: 16 KiB After Width: | Height: | Size: 11 KiB |
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 30 KiB |
|
Before Width: | Height: | Size: 52 KiB After Width: | Height: | Size: 42 KiB |
|
Before Width: | Height: | Size: 148 KiB After Width: | Height: | Size: 136 KiB |
|
Before Width: | Height: | Size: 174 KiB After Width: | Height: | Size: 158 KiB |
|
Before Width: | Height: | Size: 51 KiB After Width: | Height: | Size: 48 KiB |
|
Before Width: | Height: | Size: 84 KiB After Width: | Height: | Size: 79 KiB |
|
Before Width: | Height: | Size: 207 KiB After Width: | Height: | Size: 190 KiB |
|
Before Width: | Height: | Size: 50 KiB After Width: | Height: | Size: 39 KiB |
BIN
docSite/assets/imgs/htmlRendering1.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
docSite/assets/imgs/htmlRendering2.png
Normal file
|
After Width: | Height: | Size: 91 KiB |
BIN
docSite/assets/imgs/htmlRendering3.png
Normal file
|
After Width: | Height: | Size: 58 KiB |
BIN
docSite/assets/imgs/image-10.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
BIN
docSite/assets/imgs/image-11.png
Normal file
|
After Width: | Height: | Size: 670 KiB |
BIN
docSite/assets/imgs/image-12.png
Normal file
|
After Width: | Height: | Size: 122 KiB |
BIN
docSite/assets/imgs/image-13.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
docSite/assets/imgs/image-14.png
Normal file
|
After Width: | Height: | Size: 511 KiB |
BIN
docSite/assets/imgs/image-15.png
Normal file
|
After Width: | Height: | Size: 220 KiB |
BIN
docSite/assets/imgs/image-16.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docSite/assets/imgs/image-17.png
Normal file
|
After Width: | Height: | Size: 139 KiB |
BIN
docSite/assets/imgs/image-18.png
Normal file
|
After Width: | Height: | Size: 106 KiB |
BIN
docSite/assets/imgs/image-19.png
Normal file
|
After Width: | Height: | Size: 108 KiB |
BIN
docSite/assets/imgs/image-20.png
Normal file
|
After Width: | Height: | Size: 156 KiB |
BIN
docSite/assets/imgs/image-21.png
Normal file
|
After Width: | Height: | Size: 166 KiB |
BIN
docSite/assets/imgs/image-22.png
Normal file
|
After Width: | Height: | Size: 85 KiB |
BIN
docSite/assets/imgs/image-23.png
Normal file
|
After Width: | Height: | Size: 113 KiB |
BIN
docSite/assets/imgs/image-24.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
|
Before Width: | Height: | Size: 372 KiB After Width: | Height: | Size: 263 KiB |
|
Before Width: | Height: | Size: 471 KiB After Width: | Height: | Size: 329 KiB |
|
Before Width: | Height: | Size: 416 KiB After Width: | Height: | Size: 288 KiB |
|
Before Width: | Height: | Size: 117 KiB After Width: | Height: | Size: 80 KiB |
|
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 55 KiB |
|
Before Width: | Height: | Size: 2.1 KiB After Width: | Height: | Size: 1.2 KiB |
|
Before Width: | Height: | Size: 26 KiB After Width: | Height: | Size: 17 KiB |
|
Before Width: | Height: | Size: 124 KiB After Width: | Height: | Size: 107 KiB |
|
Before Width: | Height: | Size: 179 KiB After Width: | Height: | Size: 151 KiB |
|
Before Width: | Height: | Size: 210 KiB After Width: | Height: | Size: 206 KiB |
|
Before Width: | Height: | Size: 183 KiB After Width: | Height: | Size: 161 KiB |
|
Before Width: | Height: | Size: 156 KiB After Width: | Height: | Size: 147 KiB |
|
Before Width: | Height: | Size: 189 KiB After Width: | Height: | Size: 170 KiB |
|
Before Width: | Height: | Size: 146 KiB After Width: | Height: | Size: 138 KiB |
|
Before Width: | Height: | Size: 63 KiB After Width: | Height: | Size: 54 KiB |
|
Before Width: | Height: | Size: 128 KiB After Width: | Height: | Size: 110 KiB |
|
Before Width: | Height: | Size: 48 KiB After Width: | Height: | Size: 41 KiB |
|
Before Width: | Height: | Size: 154 KiB After Width: | Height: | Size: 132 KiB |
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 31 KiB |
|
Before Width: | Height: | Size: 178 KiB After Width: | Height: | Size: 170 KiB |
|
Before Width: | Height: | Size: 283 KiB After Width: | Height: | Size: 233 KiB |
|
Before Width: | Height: | Size: 261 KiB After Width: | Height: | Size: 242 KiB |
|
Before Width: | Height: | Size: 84 KiB After Width: | Height: | Size: 65 KiB |
|
Before Width: | Height: | Size: 238 KiB After Width: | Height: | Size: 206 KiB |
|
Before Width: | Height: | Size: 176 KiB After Width: | Height: | Size: 152 KiB |
|
Before Width: | Height: | Size: 257 KiB After Width: | Height: | Size: 235 KiB |
|
Before Width: | Height: | Size: 216 KiB After Width: | Height: | Size: 197 KiB |
|
Before Width: | Height: | Size: 126 KiB After Width: | Height: | Size: 115 KiB |
|
Before Width: | Height: | Size: 88 KiB After Width: | Height: | Size: 84 KiB |
|
Before Width: | Height: | Size: 284 KiB After Width: | Height: | Size: 282 KiB |
|
Before Width: | Height: | Size: 442 KiB After Width: | Height: | Size: 412 KiB |
|
Before Width: | Height: | Size: 465 KiB After Width: | Height: | Size: 449 KiB |
|
Before Width: | Height: | Size: 316 KiB After Width: | Height: | Size: 288 KiB |
|
Before Width: | Height: | Size: 434 KiB After Width: | Height: | Size: 403 KiB |
|
Before Width: | Height: | Size: 66 KiB After Width: | Height: | Size: 65 KiB |
|
Before Width: | Height: | Size: 341 KiB After Width: | Height: | Size: 312 KiB |
|
Before Width: | Height: | Size: 175 KiB After Width: | Height: | Size: 153 KiB |
|
Before Width: | Height: | Size: 463 KiB After Width: | Height: | Size: 436 KiB |
|
Before Width: | Height: | Size: 591 KiB After Width: | Height: | Size: 559 KiB |
|
Before Width: | Height: | Size: 238 KiB After Width: | Height: | Size: 93 KiB |
|
Before Width: | Height: | Size: 69 KiB After Width: | Height: | Size: 54 KiB |
|
Before Width: | Height: | Size: 106 KiB After Width: | Height: | Size: 86 KiB |
|
Before Width: | Height: | Size: 60 KiB After Width: | Height: | Size: 53 KiB |
|
Before Width: | Height: | Size: 162 KiB After Width: | Height: | Size: 154 KiB |
|
Before Width: | Height: | Size: 180 KiB After Width: | Height: | Size: 171 KiB |
|
Before Width: | Height: | Size: 20 KiB After Width: | Height: | Size: 15 KiB |
|
Before Width: | Height: | Size: 179 KiB After Width: | Height: | Size: 168 KiB |
|
Before Width: | Height: | Size: 190 KiB After Width: | Height: | Size: 178 KiB |
|
Before Width: | Height: | Size: 175 KiB After Width: | Height: | Size: 169 KiB |
|
Before Width: | Height: | Size: 186 KiB After Width: | Height: | Size: 179 KiB |
@@ -43,7 +43,7 @@ weight: 708
|
||||
"usedInExtractFields": true, // 是否用于内容提取(务必保证至少有一个为true)
|
||||
"usedInToolCall": true, // 是否用于工具调用(务必保证至少有一个为true)
|
||||
"usedInQueryExtension": true, // 是否用于问题优化(务必保证至少有一个为true)
|
||||
"toolChoice": true, // 是否支持工具选择(分类,内容提取,工具调用会用到。目前只有gpt支持)
|
||||
"toolChoice": true, // 是否支持工具选择(分类,内容提取,工具调用会用到。)
|
||||
"functionCall": false, // 是否支持函数调用(分类,内容提取,工具调用会用到。会优先使用 toolChoice,如果为false,则使用 functionCall,如果仍为 false,则使用提示词模式)
|
||||
"customCQPrompt": "", // 自定义文本分类提示词(不支持工具和函数调用的模型
|
||||
"customExtractPrompt": "", // 自定义内容提取提示词
|
||||
@@ -95,9 +95,7 @@ weight: 708
|
||||
"customExtractPrompt": "",
|
||||
"defaultSystemChatPrompt": "",
|
||||
"defaultConfig": {
|
||||
"temperature": 1,
|
||||
"max_tokens": null,
|
||||
"stream": false
|
||||
"temperature": 1
|
||||
}
|
||||
},
|
||||
{
|
||||
@@ -122,9 +120,7 @@ weight: 708
|
||||
"customExtractPrompt": "",
|
||||
"defaultSystemChatPrompt": "",
|
||||
"defaultConfig": {
|
||||
"temperature": 1,
|
||||
"max_tokens": null,
|
||||
"stream": false
|
||||
"temperature": 1
|
||||
}
|
||||
}
|
||||
],
|
||||
|
||||
@@ -0,0 +1,66 @@
|
||||
---
|
||||
title: '接入 Marker PDF 文档解析'
|
||||
description: '使用 Marker 解析 PDF 文档,可实现图片提取和布局识别'
|
||||
icon: 'api'
|
||||
draft: false
|
||||
toc: true
|
||||
weight: 909
|
||||
---
|
||||
|
||||
## 背景
|
||||
|
||||
PDF 是一个相对复杂的文件格式,在 FastGPT 内置的 pdf 解析器中,依赖的是 pdfjs 库解析,该库基于逻辑解析,无法有效的理解复杂的 pdf 文件。所以我们在解析 pdf 时候,如果遇到图片、表格、公式等非简单文本内容,会发现解析效果不佳。
|
||||
|
||||
市面上目前有多种解析 PDF 的方法,比如使用 [Marker](https://github.com/VikParuchuri/marker),该项目使用了 Surya 模型,基于视觉解析,可以有效提取图片、表格、公式等复杂内容。为了可以让 Marker 快速接入 FastGPT,我们做了一个自定义解析的拓展 Demo。
|
||||
|
||||
在 FastGPT 4.8.15 版本中,你可以通过增加一个环境变量,来替换掉 FastGPT 系统内置解析器,实现自定义的文档解析服务。该功能只是 Demo 阶段,后期配置模式和交互规则会发生改动。
|
||||
|
||||
## 使用教程
|
||||
|
||||
### 1. 按照 Marker
|
||||
|
||||
参考文档 [Marker 安装教程](https://github.com/labring/FastGPT/tree/main/python/pdf-marker),安装 Marker 模型。封装的 API 已经适配了 FastGPT 自定义解析服务。
|
||||
|
||||
这里介绍快速 Docker 安装的方法:
|
||||
|
||||
```dockerfile
|
||||
docker pull crpi-h3snc261q1dosroc.cn-hangzhou.personal.cr.aliyuncs.com/marker11/marker_images:latest
|
||||
docker run --gpus all -itd -p 7231:7231 --name model_pdf_v1 crpi-h3snc261q1dosroc.cn-hangzhou.personal.cr.aliyuncs.com/marker11/marker_images:latest
|
||||
```
|
||||
|
||||
### 2. 添加 FastGPT 环境变量
|
||||
|
||||
```
|
||||
CUSTOM_READ_FILE_URL=http://xxxx.com/v1/parse/file
|
||||
CUSTOM_READ_FILE_EXTENSION=pdf
|
||||
```
|
||||
|
||||
* CUSTOM_READ_FILE_URL - 自定义解析服务的地址, host改成解析服务的访问地址,path 不能变动。
|
||||
* CUSTOM_READ_FILE_EXTENSION - 支持的文件后缀,多个文件类型,可用逗号隔开。
|
||||
|
||||
### 3. 测试效果
|
||||
|
||||
通过知识库上传一个 pdf 文件,并确认上传,可以在日志中看到 LOG (LOG_LEVEL需要设置 info 或者 debug):
|
||||
|
||||
```
|
||||
[Info] 2024-12-05 15:04:42 Parsing files from an external service
|
||||
[Info] 2024-12-05 15:07:08 Custom file parsing is complete, time: 1316ms
|
||||
```
|
||||
|
||||
然后你就可以发现,通过 Marker 解析出来的 pdf 会携带图片链接:
|
||||
|
||||

|
||||
|
||||
|
||||
## 效果展示
|
||||
|
||||
以清华的 [ChatDev Communicative Agents for Software Develop.pdf](https://arxiv.org/abs/2307.07924) 为例,展示 Marker 解析的效果:
|
||||
|
||||
| | | |
|
||||
| --- | --- | --- |
|
||||
|  |  |  |
|
||||
|  |  |  |
|
||||
|
||||
上图是分块后的结果,下图是 pdf 原文。整体图片、公式、表格都可以提取出来,效果还是杠杠的。
|
||||
|
||||
不过要注意的是,[Marker](https://github.com/VikParuchuri/marker) 的协议是`GPL-3.0 license`,请在遵守协议的前提下使用。
|
||||
@@ -145,7 +145,7 @@ curl --location --request POST 'https://<oneapi_url>/v1/chat/completions' \
|
||||
"usedInExtractFields": true, // 是否用于内容提取(务必保证至少有一个为true)
|
||||
"usedInToolCall": true, // 是否用于工具调用(务必保证至少有一个为true)
|
||||
"usedInQueryExtension": true, // 是否用于问题优化(务必保证至少有一个为true)
|
||||
"toolChoice": true, // 是否支持工具选择(分类,内容提取,工具调用会用到。目前只有gpt支持)
|
||||
"toolChoice": true, // 是否支持工具选择(分类,内容提取,工具调用会用到。)
|
||||
"functionCall": false, // 是否支持函数调用(分类,内容提取,工具调用会用到。会优先使用 toolChoice,如果为false,则使用 functionCall,如果仍为 false,则使用提示词模式)
|
||||
"customCQPrompt": "", // 自定义文本分类提示词(不支持工具和函数调用的模型
|
||||
"customExtractPrompt": "", // 自定义内容提取提示词
|
||||
|
||||