Merge branch 'labring:main' into local
BIN
docSite/assets/imgs/Ollama-aiproxy1.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
docSite/assets/imgs/Ollama-aiproxy2.png
Normal file
|
After Width: | Height: | Size: 9.0 KiB |
BIN
docSite/assets/imgs/Ollama-aiproxy3.png
Normal file
|
After Width: | Height: | Size: 179 KiB |
BIN
docSite/assets/imgs/Ollama-direct1.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docSite/assets/imgs/Ollama-models1.png
Normal file
|
After Width: | Height: | Size: 20 KiB |
BIN
docSite/assets/imgs/Ollama-models2.png
Normal file
|
After Width: | Height: | Size: 138 KiB |
BIN
docSite/assets/imgs/Ollama-models3.png
Normal file
|
After Width: | Height: | Size: 122 KiB |
BIN
docSite/assets/imgs/Ollama-models4.png
Normal file
|
After Width: | Height: | Size: 124 KiB |
BIN
docSite/assets/imgs/Ollama-oneapi1.png
Normal file
|
After Width: | Height: | Size: 94 KiB |
BIN
docSite/assets/imgs/Ollama-oneapi2.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
docSite/assets/imgs/Ollama-oneapi3 .png
Normal file
|
After Width: | Height: | Size: 76 KiB |
BIN
docSite/assets/imgs/Ollama-pull.png
Normal file
|
After Width: | Height: | Size: 26 KiB |
184
docSite/content/zh-cn/docs/development/custom-models/ollama.md
Normal file
@@ -0,0 +1,184 @@
|
||||
---
|
||||
title: '使用 Ollama 接入本地模型 '
|
||||
description: ' 采用 Ollama 部署自己的模型'
|
||||
icon: 'api'
|
||||
draft: false
|
||||
toc: true
|
||||
weight: 950
|
||||
---
|
||||
|
||||
[Ollama](https://ollama.com/) 是一个开源的AI大模型部署工具,专注于简化大语言模型的部署和使用,支持一键下载和运行各种大模型。
|
||||
|
||||
## 安装 Ollama
|
||||
|
||||
Ollama 本身支持多种安装方式,但是推荐使用 Docker 拉取镜像部署。如果是个人设备上安装了 Ollama 后续需要解决如何让 Docker 中 FastGPT 容器访问宿主机 Ollama的问题,较为麻烦。
|
||||
|
||||
### Docker 安装(推荐)
|
||||
|
||||
你可以使用 Ollama 官方的 Docker 镜像来一键安装和启动 Ollama 服务(确保你的机器上已经安装了 Docker),命令如下:
|
||||
|
||||
```bash
|
||||
docker pull ollama/ollama
|
||||
docker run --rm -d --name ollama -p 11434:11434 ollama/ollama
|
||||
```
|
||||
|
||||
如果你的 FastGPT 是在 Docker 中进行部署的,建议在拉取 Ollama 镜像时保证和 FastGPT 镜像处于同一网络,否则可能出现 FastGPT 无法访问的问题,命令如下:
|
||||
|
||||
```bash
|
||||
docker run --rm -d --name ollama --network (你的 Fastgpt 容器所在网络) -p 11434:11434 ollama/ollama
|
||||
```
|
||||
|
||||
### 主机安装
|
||||
|
||||
如果你不想使用 Docker ,也可以采用主机安装,以下是主机安装的一些方式。
|
||||
|
||||
#### MacOS
|
||||
|
||||
如果你使用的是 macOS,且系统中已经安装了 Homebrew 包管理器,可通过以下命令来安装 Ollama:
|
||||
|
||||
```bash
|
||||
brew install ollama
|
||||
ollama serve #安装完成后,使用该命令启动服务
|
||||
```
|
||||
|
||||
#### Linux
|
||||
|
||||
在 Linux 系统上,你可以借助包管理器来安装 Ollama。以 Ubuntu 为例,在终端执行以下命令:
|
||||
|
||||
```bash
|
||||
curl https://ollama.com/install.sh | sh #此命令会从官方网站下载并执行安装脚本。
|
||||
ollama serve #安装完成后,同样启动服务
|
||||
```
|
||||
|
||||
#### Windows
|
||||
|
||||
在 Windows 系统中,你可以从 Ollama 官方网站 下载 Windows 版本的安装程序。下载完成后,运行安装程序,按照安装向导的提示完成安装。安装完成后,在命令提示符或 PowerShell 中启动服务:
|
||||
|
||||
```bash
|
||||
ollama serve #安装完成并启动服务后,你可以在浏览器中访问 http://localhost:11434 来验证 Ollama 是否安装成功。
|
||||
```
|
||||
|
||||
#### 补充说明
|
||||
|
||||
如果你是采用的主机应用 Ollama 而不是镜像,需要确保你的 Ollama 可以监听0.0.0.0。
|
||||
|
||||
##### 1. Linxu 系统
|
||||
|
||||
如果 Ollama 作为 systemd 服务运行,打开终端,编辑 Ollama 的 systemd 服务文件,使用命令sudo systemctl edit ollama.service,在[Service]部分添加Environment="OLLAMA_HOST=0.0.0.0"。保存并退出编辑器,然后执行sudo systemctl daemon - reload和sudo systemctl restart ollama使配置生效。
|
||||
|
||||
##### 2. MacOS 系统
|
||||
|
||||
打开终端,使用launchctl setenv ollama_host "0.0.0.0"命令设置环境变量,然后重启 Ollama 应用程序以使更改生效。
|
||||
|
||||
##### 3. Windows 系统
|
||||
|
||||
通过 “开始” 菜单或搜索栏打开 “编辑系统环境变量”,在 “系统属性” 窗口中点击 “环境变量”,在 “系统变量” 部分点击 “新建”,创建一个名为OLLAMA_HOST的变量,变量值设置为0.0.0.0,点击 “确定” 保存更改,最后从 “开始” 菜单重启 Ollama 应用程序。
|
||||
|
||||
### Ollama 拉取模型镜像
|
||||
|
||||
在安装后 Ollama 后,本地是没有模型镜像的,需要自己去拉取 Ollama 中的模型镜像。命令如下:
|
||||
|
||||
```bash
|
||||
# Docker 部署需要先进容器,命令为: docker exec -it < Ollama 容器名 > /bin/sh
|
||||
ollama pull <模型名>
|
||||
```
|
||||
|
||||

|
||||
|
||||
|
||||
### 测试通信
|
||||
|
||||
在安装完成后,需要进行检测测试,首先进入 FastGPT 所在的容器,尝试访问自己的 Ollama ,命令如下:
|
||||
|
||||
```bash
|
||||
docker exec -it < FastGPT 所在的容器名 > /bin/sh
|
||||
curl http://XXX.XXX.XXX.XXX:11434 #容器部署地址为“http://<容器名>:<端口>”,主机安装地址为"http://<主机IP>:<端口>",主机IP不可为localhost
|
||||
```
|
||||
|
||||
看到访问显示自己的 Ollama 服务以及启动,说明可以正常通信。
|
||||
|
||||
## 将 Ollama 接入 FastGPT
|
||||
|
||||
### 1. 查看 Ollama 所拥有的模型
|
||||
|
||||
首先采用下述命令查看 Ollama 中所拥有的模型,
|
||||
|
||||
```bash
|
||||
# Docker 部署 Ollama,需要此命令 docker exec -it < Ollama 容器名 > /bin/sh
|
||||
ollama ls
|
||||
```
|
||||
|
||||

|
||||
|
||||
### 2. AI Proxy 接入
|
||||
|
||||
如果你采用的是 FastGPT 中的默认配置文件部署[这里](/docs/development/docker.md),即默认采用 AI Proxy 进行启动。
|
||||
|
||||

|
||||
|
||||
以及在确保你的 FastGPT 可以直接访问 Ollama 容器的情况下,无法访问,参考上文[点此跳转](#安装-ollama)的安装过程,检测是不是主机不能监测0.0.0.0,或者容器不在同一个网络。
|
||||
|
||||

|
||||
|
||||
在 FastGPT 中点击账号->模型提供商->模型配置->新增模型,添加自己的模型即可,添加模型时需要保证模型ID和 OneAPI 中的模型名称一致。详细参考[这里](/docs/development/modelConfig/intro.md)
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
运行 FastGPT ,在页面中选择账号->模型提供商->模型渠道->新增渠道。之后,在渠道选择中选择 Ollama ,然后加入自己拉取的模型,填入代理地址,如果是容器中安装 Ollama ,代理地址为http://地址:端口,补充:容器部署地址为“http://<容器名>:<端口>”,主机安装地址为"http://<主机IP>:<端口>",主机IP不可为localhost
|
||||
|
||||

|
||||
|
||||
在工作台中创建一个应用,选择自己之前添加的模型,此处模型名称为自己当时设置的别名。注:同一个模型无法多次添加,系统会采取最新添加时设置的别名。
|
||||
|
||||

|
||||
|
||||
### 3. OneAPI 接入
|
||||
|
||||
如果你想使用 OneAPI ,首先需要拉取 OneAPI 镜像,然后将其在 FastGPT 容器的网络中运行。具体命令如下:
|
||||
|
||||
```bash
|
||||
# 拉取 oneAPI 镜像
|
||||
docker pull intel/oneapi-hpckit
|
||||
|
||||
# 运行容器并指定自定义网络和容器名
|
||||
docker run -it --network < FastGPT 网络 > --name 容器名 intel/oneapi-hpckit /bin/bash
|
||||
```
|
||||
|
||||
进入 OneAPI 页面,添加新的渠道,类型选择 Ollama ,在模型中填入自己 Ollama 中的模型,需要保证添加的模型名称和 Ollama 中一致,再在下方填入自己的 Ollama 代理地址,默认http://地址:端口,不需要填写/v1。添加成功后在 OneAPI 进行渠道测试,测试成功则说明添加成功。此处演示采用的是 Docker 部署 Ollama 的效果,主机 Ollama需要修改代理地址为http://<主机IP>:<端口>
|
||||
|
||||

|
||||
|
||||
渠道添加成功后,点击令牌,点击添加令牌,填写名称,修改配置。
|
||||
|
||||

|
||||
|
||||
修改部署 FastGPT 的 docker-compose.yml 文件,在其中将 AI Proxy 的使用注释,在 OPENAI_BASE_URL 中加入自己的 OneAPI 开放地址,默认是http://地址:端口/v1,v1必须填写。KEY 中填写自己在 OneAPI 的令牌。
|
||||
|
||||

|
||||
|
||||
[直接跳转5](#5-模型添加和使用)添加模型,并使用。
|
||||
|
||||
### 4. 直接接入
|
||||
|
||||
如果你既不想使用 AI Proxy,也不想使用 OneAPI,也可以选择直接接入,修改部署 FastGPT 的 docker-compose.yml 文件,在其中将 AI Proxy 的使用注释,采用和 OneAPI 的类似配置。注释掉 AIProxy 相关代码,在OPENAI_BASE_URL中加入自己的 Ollama 开放地址,默认是http://地址:端口/v1,强调:v1必须填写。在KEY中随便填入,因为 Ollama 默认没有鉴权,如果开启鉴权,请自行填写。其他操作和在 OneAPI 中加入 Ollama 一致,只需在 FastGPT 中加入自己的模型即可使用。此处演示采用的是 Docker 部署 Ollama 的效果,主机 Ollama需要修改代理地址为http://<主机IP>:<端口>
|
||||
|
||||

|
||||
|
||||
完成后[点击这里](#5-模型添加和使用)进行模型添加并使用。
|
||||
|
||||
### 5. 模型添加和使用
|
||||
|
||||
在 FastGPT 中点击账号->模型提供商->模型配置->新增模型,添加自己的模型即可,添加模型时需要保证模型ID和 OneAPI 中的模型名称一致。
|
||||
|
||||

|
||||
|
||||

|
||||
|
||||
在工作台中创建一个应用,选择自己之前添加的模型,此处模型名称为自己当时设置的别名。注:同一个模型无法多次添加,系统会采取最新添加时设置的别名。
|
||||
|
||||

|
||||
|
||||
### 6. 补充
|
||||
上述接入 Ollama 的代理地址中,主机安装 Ollama 的地址为“http://<主机IP>:<端口>”,容器部署 Ollama 地址为“http://<容器名>:<端口>”
|
||||
@@ -1,5 +1,5 @@
|
||||
protobuf
|
||||
transformers==4.30.2
|
||||
transformers==4.48.0
|
||||
cpm_kernels
|
||||
torch>=2.0
|
||||
gradio
|
||||
|
||||
@@ -6,6 +6,6 @@ sentence_transformers==2.2.2
|
||||
sse_starlette==1.6.5
|
||||
starlette==0.27.0
|
||||
tiktoken==0.4.0
|
||||
torch==2.0.1
|
||||
transformers==4.31.0
|
||||
torch==2.4.0
|
||||
transformers==4.48.0
|
||||
uvicorn==0.23.2
|
||||
|
||||
85
plugins/model/pdf-mineru/README.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Readme
|
||||
|
||||
# 项目介绍
|
||||
---
|
||||
本项目参照官方插件**pdf-marker,**基于MinertU实现了一个高效的 **PDF 转 Markdown 接口服务**,通过高性能的接口设计,快速将 PDF 文档转换为 Markdown 格式文本。
|
||||
|
||||
- **简洁性:**项目无需修改代码,仅需调整文件路径即可使用,简单易用
|
||||
- **易用性:**通过提供简洁的 API,开发者只需发送 HTTP 请求即可完成 PDF 转换
|
||||
- **灵活性:**支持本地部署,便于快速上手和灵活集成
|
||||
|
||||
# 配置推荐
|
||||
|
||||
配置及速率请参照[MinerU项目](https://github.com/opendatalab/MinerU/blob/master/README_zh-CN.md)官方介绍。
|
||||
|
||||
# 本地开发
|
||||
|
||||
## 基本流程
|
||||
|
||||
1、安装基本环境,主要参照官方文档[使用CPU及GPU](https://github.com/opendatalab/MinerU/blob/master/README_zh-CN.md#%E4%BD%BF%E7%94%A8GPU)运行MinerU的方式进行。具体如下,首先使用anaconda安装基础运行环境
|
||||
|
||||
```bash
|
||||
conda create -n mineru python=3.10
|
||||
conda activate mineru
|
||||
pip install -U "magic-pdf[full]" --extra-index-url https://wheels.myhloli.com -i https://mirrors.aliyun.com/pypi/simple
|
||||
```
|
||||
|
||||
2、[下载模型权重文件](https://github.com/opendatalab/MinerU/blob/master/docs/how_to_download_models_zh_cn.md)
|
||||
|
||||
```bash
|
||||
pip install modelscope
|
||||
wget https://gcore.jsdelivr.net/gh/opendatalab/MinerU@master/scripts/download_models.py -O download_models.py
|
||||
python download_models.py
|
||||
```
|
||||
|
||||
python脚本会自动下载模型文件并配置好配置文件中的模型目录
|
||||
|
||||
配置文件可以在用户目录中找到,文件名为`magic-pdf.json`
|
||||
|
||||
> windows的用户目录为 "C:\\Users\\用户名", linux用户目录为 "/home/用户名", macOS用户目录为 "/Users/用户名"
|
||||
|
||||
3、如果您的显卡显存大于等于 **8GB** ,可以进行以下流程,测试CUDA解析加速效果。默认为cpu模式,使用显卡的话需修改【用户目录】中配置文件magic-pdf.json中"device-mode"的值。
|
||||
|
||||
```bash
|
||||
{
|
||||
"device-mode":"cuda"
|
||||
}
|
||||
```
|
||||
|
||||
4、如需使用GPU加速,需额外再安装依赖。
|
||||
|
||||
```bash
|
||||
pip install --force-reinstall torch==2.3.1 torchvision==0.18.1 "numpy<2.0.0" --index-url https://download.pytorch.org/whl/cu118
|
||||
```
|
||||
|
||||
```bash
|
||||
pip install paddlepaddle-gpu==2.6.1
|
||||
```
|
||||
|
||||
5、克隆一个FastGPT的项目文件
|
||||
|
||||
```
|
||||
git clone https://github.com/labring/FastGPT.git
|
||||
```
|
||||
|
||||
6、将主目录设置为 plugins/model 下的pdf-mineru文件夹
|
||||
|
||||
```
|
||||
cd /plugins/model/pdf-mineru/
|
||||
```
|
||||
|
||||
7、执行文件pdf_parser_mineru.py,启动服务
|
||||
|
||||
```bash
|
||||
python pdf_parser_mineru.py
|
||||
```
|
||||
|
||||
# 访问示例
|
||||
|
||||
仿照了**pdf-marker**的方式。
|
||||
|
||||
```bash
|
||||
curl --location --request POST "http://localhost:7231/v1/parse/file" \
|
||||
--header "Authorization: Bearer your_access_token" \
|
||||
--form "file=@./file/chinese_test.pdf"
|
||||
```
|
||||
282
plugins/model/pdf-mineru/main.py
Normal file
@@ -0,0 +1,282 @@
|
||||
import json
|
||||
import os
|
||||
from base64 import b64encode
|
||||
from glob import glob
|
||||
from io import StringIO
|
||||
from typing import Tuple, Union
|
||||
|
||||
import uvicorn
|
||||
from fastapi import FastAPI, UploadFile, File
|
||||
from fastapi.responses import JSONResponse
|
||||
from loguru import logger
|
||||
from tempfile import TemporaryDirectory
|
||||
from pathlib import Path
|
||||
import fitz # PyMuPDF
|
||||
import asyncio
|
||||
from concurrent.futures import ProcessPoolExecutor
|
||||
import torch
|
||||
import multiprocessing as mp
|
||||
from contextlib import asynccontextmanager
|
||||
import time
|
||||
|
||||
import magic_pdf.model as model_config
|
||||
from magic_pdf.config.enums import SupportedPdfParseMethod
|
||||
from magic_pdf.data.data_reader_writer import DataWriter, FileBasedDataWriter
|
||||
from magic_pdf.data.dataset import PymuDocDataset
|
||||
from magic_pdf.model.doc_analyze_by_custom_model import doc_analyze
|
||||
from magic_pdf.operators.models import InferenceResult
|
||||
from magic_pdf.operators.pipes import PipeResult
|
||||
|
||||
model_config.__use_inside_model__ = True
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
process_variables = {}
|
||||
my_pool = None
|
||||
|
||||
class MemoryDataWriter(DataWriter):
|
||||
def __init__(self):
|
||||
self.buffer = StringIO()
|
||||
|
||||
def write(self, path: str, data: bytes) -> None:
|
||||
if isinstance(data, str):
|
||||
self.buffer.write(data)
|
||||
else:
|
||||
self.buffer.write(data.decode("utf-8"))
|
||||
|
||||
def write_string(self, path: str, data: str) -> None:
|
||||
self.buffer.write(data)
|
||||
|
||||
def get_value(self) -> str:
|
||||
return self.buffer.getvalue() # 修复:使用 getvalue() 而不是 get_value()
|
||||
|
||||
def close(self):
|
||||
self.buffer.close()
|
||||
|
||||
def worker_init(counter, lock):
|
||||
num_gpus = torch.cuda.device_count()
|
||||
processes_per_gpu = int(os.environ.get('PROCESSES_PER_GPU', 1))
|
||||
with lock:
|
||||
worker_id = counter.value
|
||||
counter.value += 1
|
||||
if num_gpus == 0:
|
||||
device = 'cpu'
|
||||
else:
|
||||
device_id = worker_id // processes_per_gpu
|
||||
if device_id >= num_gpus:
|
||||
raise ValueError(f"Worker ID {worker_id} exceeds available GPUs ({num_gpus}).")
|
||||
device = f'cuda:{device_id}'
|
||||
config = {
|
||||
"parse_method": "auto",
|
||||
"ADDITIONAL_KEY": "VALUE"
|
||||
}
|
||||
converter = init_converter(config, device_id)
|
||||
pid = os.getpid()
|
||||
process_variables[pid] = converter
|
||||
print(f"Worker {worker_id}: Models loaded successfully on {device}!")
|
||||
|
||||
def init_converter(config, device_id):
|
||||
os.environ["CUDA_VISIBLE_DEVICES"] = str(device_id)
|
||||
return config
|
||||
|
||||
def img_to_base64(img_path: str) -> str:
|
||||
with open(img_path, "rb") as img_file:
|
||||
return b64encode(img_file.read()).decode('utf-8')
|
||||
|
||||
def embed_images_as_base64(md_content: str, image_dir: str) -> str:
|
||||
lines = md_content.split('\n')
|
||||
new_lines = []
|
||||
for line in lines:
|
||||
if line.startswith("" in line:
|
||||
start_idx = line.index("](") + 2
|
||||
end_idx = line.index(")", start_idx)
|
||||
img_rel_path = line[start_idx:end_idx]
|
||||
img_name = os.path.basename(img_rel_path)
|
||||
img_path = os.path.join(image_dir, img_name)
|
||||
logger.info(f"Checking image: {img_path}")
|
||||
if os.path.exists(img_path):
|
||||
img_base64 = img_to_base64(img_path)
|
||||
new_line = f""
|
||||
new_lines.append(new_line)
|
||||
else:
|
||||
logger.warning(f"Image not found: {img_path}")
|
||||
new_lines.append(line)
|
||||
else:
|
||||
new_lines.append(line)
|
||||
return '\n'.join(new_lines)
|
||||
|
||||
def process_pdf(pdf_path, output_dir):
|
||||
try:
|
||||
pid = os.getpid()
|
||||
config = process_variables.get(pid, "No variable")
|
||||
parse_method = config["parse_method"]
|
||||
|
||||
with open(str(pdf_path), "rb") as f:
|
||||
pdf_bytes = f.read()
|
||||
|
||||
output_path = Path(output_dir) / f"{Path(pdf_path).stem}_output"
|
||||
os.makedirs(str(output_path), exist_ok=True)
|
||||
image_dir = os.path.join(str(output_path), "images")
|
||||
os.makedirs(image_dir, exist_ok=True)
|
||||
image_writer = FileBasedDataWriter(str(output_path))
|
||||
|
||||
# 处理 PDF
|
||||
infer_result, pipe_result = process_pdf_content(pdf_bytes, parse_method, image_writer)
|
||||
|
||||
md_content_writer = MemoryDataWriter()
|
||||
pipe_result.dump_md(md_content_writer, "", "images")
|
||||
md_content = md_content_writer.get_value()
|
||||
md_content_writer.close()
|
||||
|
||||
# 获取保存的图片路径
|
||||
image_paths = glob(os.path.join(image_dir, "*.jpg"))
|
||||
logger.info(f"Saved images by magic_pdf: {image_paths}")
|
||||
|
||||
# 如果 magic_pdf 未保存足够图片,使用 fitz 提取
|
||||
if not image_paths or len(image_paths) < 3: # 假设至少 3 张图片
|
||||
logger.warning("Insufficient images saved by magic_pdf, falling back to fitz extraction")
|
||||
image_map = {}
|
||||
original_names = []
|
||||
# 收集 Markdown 中的所有图片文件名
|
||||
for line in md_content.split('\n'):
|
||||
if line.startswith("" in line:
|
||||
start_idx = line.index("](") + 2
|
||||
end_idx = line.index(")", start_idx)
|
||||
img_rel_path = line[start_idx:end_idx]
|
||||
original_names.append(os.path.basename(img_rel_path))
|
||||
|
||||
# 提取图片并映射
|
||||
with fitz.open(pdf_path) as doc:
|
||||
img_counter = 0
|
||||
for page_num, page in enumerate(doc):
|
||||
for img_index, img in enumerate(page.get_images(full=True)):
|
||||
xref = img[0]
|
||||
base = doc.extract_image(xref)
|
||||
if img_counter < len(original_names):
|
||||
img_name = original_names[img_counter] # 使用 Markdown 中的原始文件名
|
||||
else:
|
||||
img_name = f"page_{page_num}_img_{img_index}.jpg"
|
||||
img_path = os.path.join(image_dir, img_name)
|
||||
with open(img_path, "wb") as f:
|
||||
f.write(base["image"])
|
||||
if img_counter < len(original_names):
|
||||
image_map[original_names[img_counter]] = img_name
|
||||
img_counter += 1
|
||||
|
||||
image_paths = glob(os.path.join(image_dir, "*.jpg"))
|
||||
logger.info(f"Images extracted by fitz: {image_paths}")
|
||||
|
||||
# 更新 Markdown(仅在必要时替换)
|
||||
for original_name, new_name in image_map.items():
|
||||
if original_name != new_name:
|
||||
md_content = md_content.replace(f"images/{original_name}", f"images/{new_name}")
|
||||
|
||||
return {
|
||||
"status": "success",
|
||||
"text": md_content,
|
||||
"output_path": str(output_path),
|
||||
"images": image_paths
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing PDF: {str(e)}")
|
||||
return {
|
||||
"status": "error",
|
||||
"message": str(e),
|
||||
"file": str(pdf_path)
|
||||
}
|
||||
|
||||
def process_pdf_content(pdf_bytes, parse_method, image_writer):
|
||||
ds = PymuDocDataset(pdf_bytes)
|
||||
infer_result: InferenceResult = None
|
||||
pipe_result: PipeResult = None
|
||||
|
||||
if parse_method == "ocr":
|
||||
infer_result = ds.apply(doc_analyze, ocr=True)
|
||||
pipe_result = infer_result.pipe_ocr_mode(image_writer)
|
||||
elif parse_method == "txt":
|
||||
infer_result = ds.apply(doc_analyze, ocr=False)
|
||||
pipe_result = infer_result.pipe_txt_mode(image_writer)
|
||||
else: # auto
|
||||
if ds.classify() == SupportedPdfParseMethod.OCR:
|
||||
infer_result = ds.apply(doc_analyze, ocr=True)
|
||||
pipe_result = infer_result.pipe_ocr_mode(image_writer)
|
||||
else:
|
||||
infer_result = ds.apply(doc_analyze, ocr=False)
|
||||
pipe_result = infer_result.pipe_txt_mode(image_writer)
|
||||
|
||||
return infer_result, pipe_result
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
try:
|
||||
mp.set_start_method('spawn')
|
||||
except RuntimeError:
|
||||
raise RuntimeError("Set start method to spawn twice. This may be a temporary issue with the script. Please try running it again.")
|
||||
global my_pool
|
||||
manager = mp.Manager()
|
||||
worker_counter = manager.Value('i', 0)
|
||||
worker_lock = manager.Lock()
|
||||
gpu_count = torch.cuda.device_count()
|
||||
my_pool = ProcessPoolExecutor(max_workers=gpu_count * int(os.environ.get('PROCESSES_PER_GPU', 1)),
|
||||
initializer=worker_init, initargs=(worker_counter, worker_lock))
|
||||
yield
|
||||
if my_pool:
|
||||
my_pool.shutdown(wait=True)
|
||||
print("Application shutdown, cleaning up...")
|
||||
|
||||
app.router.lifespan_context = lifespan
|
||||
|
||||
@app.post("/v2/parse/file")
|
||||
async def process_pdfs(file: UploadFile = File(...)):
|
||||
s_time = time.time()
|
||||
with TemporaryDirectory() as temp_dir:
|
||||
temp_path = Path(temp_dir) / file.filename
|
||||
with open(str(temp_path), "wb") as buffer:
|
||||
buffer.write(await file.read())
|
||||
|
||||
# 验证 PDF 文件
|
||||
try:
|
||||
with fitz.open(str(temp_path)) as pdf_document:
|
||||
total_pages = pdf_document.page_count
|
||||
except fitz.fitz.FileDataError:
|
||||
return JSONResponse(content={"success": False, "message": "", "error": "Invalid PDF file"}, status_code=400)
|
||||
except Exception as e:
|
||||
logger.error(f"Error opening PDF: {str(e)}")
|
||||
return JSONResponse(content={"success": False, "message": "", "error": f"Internal server error: {str(e)}"}, status_code=500)
|
||||
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
results = await loop.run_in_executor(
|
||||
my_pool,
|
||||
process_pdf,
|
||||
str(temp_path),
|
||||
str(temp_dir)
|
||||
)
|
||||
|
||||
if results.get("status") == "error":
|
||||
return JSONResponse(content={
|
||||
"success": False,
|
||||
"message": "",
|
||||
"error": results.get("message")
|
||||
}, status_code=500)
|
||||
|
||||
# 嵌入 Base64
|
||||
image_dir = os.path.join(results.get("output_path"), "images")
|
||||
md_content_with_base64 = embed_images_as_base64(results.get("text"), image_dir)
|
||||
|
||||
return {
|
||||
"success": True,
|
||||
"message": "",
|
||||
"markdown": md_content_with_base64,
|
||||
"pages": total_pages
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"Error in process_pdfs: {str(e)}")
|
||||
return JSONResponse(content={
|
||||
"success": False,
|
||||
"message": "",
|
||||
"error": f"Internal server error: {str(e)}"
|
||||
}, status_code=500)
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(app, host="0.0.0.0", port=7231)
|
||||
1
plugins/model/pdf-mistral/.env
Normal file
@@ -0,0 +1 @@
|
||||
MISTRAL_API_KEY=
|
||||
143
plugins/model/pdf-mistral/README.md
Normal file
@@ -0,0 +1,143 @@
|
||||
# PDF-Mistral 插件
|
||||
|
||||
此插件使用 Mistral 的 OCR API 将 PDF 文件转换为 Markdown 文本。它可以从 PDF 文档中提取文本内容和图像,并将它们作为带有嵌入式 base64 图像的 Markdown 返回。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- 使用 Mistral OCR API 提取 PDF 文本
|
||||
- Markdown 中的 base64 图像嵌入
|
||||
- 完善的错误处理
|
||||
- 支持多页 PDF
|
||||
|
||||
## 设置
|
||||
|
||||
### 前提条件
|
||||
|
||||
- Python 3.8+
|
||||
- Mistral API 密钥([在此获取](https://mistral.ai/))
|
||||
|
||||
### 安装
|
||||
|
||||
1. 安装所需的软件包:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. 通过创建/编辑 `.env` 文件设置环境变量:
|
||||
|
||||
```bash
|
||||
# 在 .env 文件中
|
||||
MISTRAL_API_KEY=你的-mistral-api-密钥
|
||||
```
|
||||
|
||||
## 使用方法
|
||||
|
||||
### 启动服务器
|
||||
|
||||
使用以下命令运行服务器:
|
||||
|
||||
```bash
|
||||
python api_mp.py
|
||||
```
|
||||
|
||||
或者直接使用 uvicorn:
|
||||
|
||||
```bash
|
||||
uvicorn api_mp:app --host 0.0.0.0 --port 7231
|
||||
```
|
||||
|
||||
然后配置到FastGPT配置文件即可
|
||||
```json
|
||||
{
|
||||
xxx
|
||||
"systemEnv": {
|
||||
xxx
|
||||
"customPdfParse": {
|
||||
"url": "http://localhost:7231/v1/parse/file", // 自定义 PDF 解析服务地址
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### API 端点
|
||||
|
||||
#### 解析 PDF 文件
|
||||
|
||||
**端点**:`POST /v1/parse/file`
|
||||
|
||||
**请求**:
|
||||
- 包含文件字段的多部分表单数据
|
||||
|
||||
**响应**:
|
||||
```json
|
||||
{
|
||||
"pages": 5, // PDF 中的页数
|
||||
"markdown": "...", // 带有嵌入式 base64 图像的 Markdown 内容
|
||||
"duration": 10.5 // 处理时间(秒)
|
||||
}
|
||||
```
|
||||
|
||||
**错误响应**:
|
||||
```json
|
||||
{
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": "错误信息"
|
||||
}
|
||||
```
|
||||
|
||||
### 使用示例
|
||||
|
||||
使用 curl:
|
||||
|
||||
```bash
|
||||
curl -X POST -F "file=@path/to/your/document.pdf" http://localhost:7231/v1/parse/file
|
||||
```
|
||||
|
||||
使用 JavaScript/Axios:
|
||||
|
||||
```javascript
|
||||
const formData = new FormData();
|
||||
formData.append('file', pdfFile);
|
||||
|
||||
const response = await axios.post('http://localhost:7231/v1/parse/file', formData, {
|
||||
headers: {
|
||||
'Content-Type': 'multipart/form-data'
|
||||
}
|
||||
});
|
||||
|
||||
if (response.data.error) {
|
||||
console.error('错误:', response.data.error);
|
||||
} else {
|
||||
console.log('页数:', response.data.pages);
|
||||
console.log('Markdown:', response.data.markdown);
|
||||
}
|
||||
```
|
||||
|
||||
## 限制
|
||||
|
||||
- PDF 文件必须可读且没有密码保护
|
||||
- 最大文件大小取决于 Mistral API 限制(目前最大52.4M)
|
||||
- Mistral API 有页面限制(最多最大1000页)
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 常见错误
|
||||
|
||||
1. **"MISTRAL_API_KEY environment variable not set"(未设置 MISTRAL_API_KEY 环境变量)**
|
||||
- 确保您已将 Mistral API 密钥添加到 `.env` 文件中
|
||||
- 确保 `.env` 文件与脚本在同一目录中
|
||||
|
||||
2. **"Failed to process PDF file"(无法处理 PDF 文件)**
|
||||
- PDF 可能已损坏或受密码保护
|
||||
- 尝试使用其他 PDF 文件
|
||||
|
||||
3. **Mistral API 错误**
|
||||
- 检查您的 Mistral API 密钥是否有效
|
||||
- 确保您在 Mistral API 速率限制范围内
|
||||
- 验证 PDF 是否在大小/页数限制范围内
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT 许可证
|
||||
230
plugins/model/pdf-mistral/api_mp.py
Executable file
@@ -0,0 +1,230 @@
|
||||
import time
|
||||
import base64
|
||||
import fitz
|
||||
import re
|
||||
import json
|
||||
from contextlib import asynccontextmanager
|
||||
from loguru import logger
|
||||
from fastapi import HTTPException, FastAPI, UploadFile, File
|
||||
from fastapi.responses import JSONResponse
|
||||
from mistralai import Mistral
|
||||
import os
|
||||
import shutil
|
||||
from dotenv import load_dotenv
|
||||
|
||||
# Load environment variables from .env file
|
||||
load_dotenv()
|
||||
|
||||
app = FastAPI()
|
||||
temp_dir = "./temp"
|
||||
|
||||
# Initialize Mistral client with API key from environment variable
|
||||
mistral_api_key = os.environ.get("MISTRAL_API_KEY", "")
|
||||
if not mistral_api_key:
|
||||
logger.warning("MISTRAL_API_KEY environment variable not set. PDF processing will fail.")
|
||||
|
||||
mistral_client = Mistral(api_key=mistral_api_key) if mistral_api_key else None
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
# Create temp directory if it doesn't exist
|
||||
global temp_dir
|
||||
if not os.path.exists(temp_dir):
|
||||
os.makedirs(temp_dir)
|
||||
print("Application startup, creating temp directory...")
|
||||
yield
|
||||
if temp_dir and os.path.exists(temp_dir):
|
||||
shutil.rmtree(temp_dir)
|
||||
print("Application shutdown, cleaning up...")
|
||||
|
||||
app.router.lifespan_context = lifespan
|
||||
|
||||
@app.post("/v1/parse/file")
|
||||
async def read_file(
|
||||
file: UploadFile = File(...)):
|
||||
temp_file_path = None
|
||||
try:
|
||||
start_time = time.time()
|
||||
global temp_dir
|
||||
os.makedirs(temp_dir, exist_ok=True)
|
||||
temp_file_path = os.path.join(temp_dir, file.filename)
|
||||
with open(temp_file_path, "wb") as temp_file:
|
||||
file_content = await file.read()
|
||||
temp_file.write(file_content)
|
||||
|
||||
# Get page count using PyMuPDF
|
||||
try:
|
||||
pdf_document = fitz.open(temp_file_path)
|
||||
total_pages = pdf_document.page_count
|
||||
pdf_document.close()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to open PDF file: {str(e)}")
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": f"Failed to process PDF file: {str(e)}"
|
||||
}
|
||||
|
||||
if mistral_client is None:
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": "MISTRAL_API_KEY environment variable not set."
|
||||
}
|
||||
|
||||
# Step 1: Upload the file to Mistral's servers
|
||||
logger.info(f"Uploading file {file.filename} to Mistral servers")
|
||||
with open(temp_file_path, "rb") as f:
|
||||
try:
|
||||
uploaded_file = mistral_client.files.upload(
|
||||
file={
|
||||
"file_name": file.filename,
|
||||
"content": f,
|
||||
},
|
||||
purpose="ocr"
|
||||
)
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
# Try to parse Mistral API error format
|
||||
try:
|
||||
error_data = json.loads(error_msg)
|
||||
if error_data.get("object") == "error":
|
||||
error_msg = error_data.get("message", error_msg)
|
||||
except:
|
||||
pass
|
||||
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": f"Mistral API upload error: {error_msg}"
|
||||
}
|
||||
|
||||
# Step 2: Get a signed URL for the uploaded file
|
||||
logger.info(f"Getting signed URL for file ID: {uploaded_file.id}")
|
||||
try:
|
||||
signed_url = mistral_client.files.get_signed_url(file_id=uploaded_file.id)
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
# Try to parse Mistral API error format
|
||||
try:
|
||||
error_data = json.loads(error_msg)
|
||||
if error_data.get("object") == "error":
|
||||
error_msg = error_data.get("message", error_msg)
|
||||
except:
|
||||
pass
|
||||
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": f"Mistral API signed URL error: {error_msg}"
|
||||
}
|
||||
|
||||
# Step 3: Process the file using the signed URL
|
||||
logger.info("Processing file with OCR API")
|
||||
try:
|
||||
ocr_response = mistral_client.ocr.process(
|
||||
model="mistral-ocr-latest",
|
||||
document={
|
||||
"type": "document_url",
|
||||
"document_url": signed_url.url,
|
||||
},
|
||||
include_image_base64=True
|
||||
)
|
||||
except Exception as e:
|
||||
error_msg = str(e)
|
||||
# Try to parse Mistral API error format
|
||||
try:
|
||||
error_data = json.loads(error_msg)
|
||||
if error_data.get("object") == "error":
|
||||
error_msg = error_data.get("message", error_msg)
|
||||
except:
|
||||
pass
|
||||
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": f"Mistral OCR processing error: {error_msg}"
|
||||
}
|
||||
|
||||
# Combine all pages' markdown content
|
||||
markdown_content = "\n".join(page.markdown for page in ocr_response.pages)
|
||||
|
||||
# Create a dictionary to map image filenames to their base64 data
|
||||
image_map = {}
|
||||
for page in ocr_response.pages:
|
||||
for img in page.images:
|
||||
# Extract the image filename from the image id
|
||||
img_id = img.id
|
||||
img_base64 = img.image_base64
|
||||
|
||||
# Print a sample of the first image base64 data for debugging
|
||||
if len(image_map) == 0 and img_base64:
|
||||
print("Sample image base64 prefix:", img_base64[:50] if len(img_base64) > 50 else img_base64)
|
||||
print("Does base64 already include prefix?", img_base64.startswith("data:image/"))
|
||||
|
||||
# Ensure the base64 data is in the correct format for the upstream system
|
||||
# If it doesn't already have the prefix, add it
|
||||
if not img_base64.startswith("data:image/"):
|
||||
# Assume it's a PNG if we can't determine the type
|
||||
img_base64 = f"data:image/png;base64,{img_base64}"
|
||||
|
||||
# Add both potential formats to the map
|
||||
image_map[f"{img_id}.jpeg"] = img_base64
|
||||
image_map[f"{img_id}.png"] = img_base64
|
||||
image_map[img_id] = img_base64
|
||||
|
||||
# Use regex to find all image references in the markdown content
|
||||
# This will match patterns like 
|
||||
image_pattern = r'!\[(.*?)\]\((.*?)\)'
|
||||
|
||||
def replace_image_with_base64(match):
|
||||
alt_text = match.group(1)
|
||||
img_filename = match.group(2)
|
||||
|
||||
# Extract just the filename without path
|
||||
img_filename_only = os.path.basename(img_filename)
|
||||
|
||||
# Check if we have base64 data for this image
|
||||
if img_filename_only in image_map:
|
||||
return f""
|
||||
else:
|
||||
# If we don't have base64 data, keep the original reference
|
||||
logger.warning(f"No base64 data found for image: {img_filename_only}")
|
||||
return match.group(0)
|
||||
|
||||
# Replace all image references with base64 data
|
||||
markdown_content = re.sub(image_pattern, replace_image_with_base64, markdown_content)
|
||||
|
||||
# Clean up the uploaded file from Mistral's servers
|
||||
try:
|
||||
logger.info(f"Deleting uploaded file from Mistral servers: {uploaded_file.id}")
|
||||
mistral_client.files.delete(file_id=uploaded_file.id)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete uploaded file: {e}")
|
||||
|
||||
end_time = time.time()
|
||||
duration = end_time - start_time
|
||||
print(file.filename + " Total time:", duration)
|
||||
|
||||
# Return with format matching client expectations
|
||||
return {
|
||||
"pages": total_pages,
|
||||
"markdown": markdown_content,
|
||||
"duration": duration # Keep this for logging purposes
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.exception(e)
|
||||
return {
|
||||
"pages": 0,
|
||||
"markdown": "",
|
||||
"error": f"Internal server error: {str(e)}"
|
||||
}
|
||||
|
||||
finally:
|
||||
if temp_file_path and os.path.exists(temp_file_path):
|
||||
os.remove(temp_file_path)
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
uvicorn.run(app, host="0.0.0.0", port=7231)
|
||||
8
plugins/model/pdf-mistral/requirements.txt
Normal file
@@ -0,0 +1,8 @@
|
||||
fastapi==0.115.5
|
||||
uvicorn==0.32.1
|
||||
mistralai>=1.5.0
|
||||
PyMuPDF==1.24.14
|
||||
python-multipart==0.0.18
|
||||
python-dotenv==1.0.1
|
||||
loguru==0.7.2
|
||||
requests==2.32.3
|
||||
@@ -18,7 +18,7 @@ const QuoteList = React.memo(function QuoteList({
|
||||
rawSearch: SearchDataResponseItemType[];
|
||||
}) {
|
||||
const theme = useTheme();
|
||||
const { chatId, appId, outLinkAuthData } = useChatStore();
|
||||
const { appId, outLinkAuthData } = useChatStore();
|
||||
|
||||
const RawSourceBoxProps = useContextSelector(ChatBoxContext, (v) => ({
|
||||
chatItemDataId,
|
||||
@@ -39,10 +39,11 @@ const QuoteList = React.memo(function QuoteList({
|
||||
collectionIdList: [...new Set(rawSearch.map((item) => item.collectionId))],
|
||||
chatItemDataId,
|
||||
appId,
|
||||
chatId,
|
||||
chatId: RawSourceBoxProps.chatId,
|
||||
...outLinkAuthData
|
||||
}),
|
||||
{
|
||||
refreshDeps: [rawSearch, RawSourceBoxProps.chatId],
|
||||
manual: false
|
||||
}
|
||||
);
|
||||
|
||||
@@ -3,7 +3,7 @@ import { ChatHistoryItemResType, ChatItemType } from '@fastgpt/global/core/chat/
|
||||
import { SearchDataResponseItemType } from '@fastgpt/global/core/dataset/type';
|
||||
import { FlowNodeTypeEnum } from '@fastgpt/global/core/workflow/node/constant';
|
||||
|
||||
const isLLMNode = (item: ChatHistoryItemResType) =>
|
||||
export const isLLMNode = (item: ChatHistoryItemResType) =>
|
||||
item.moduleType === FlowNodeTypeEnum.chatNode || item.moduleType === FlowNodeTypeEnum.tools;
|
||||
|
||||
export function transformPreviewHistories(
|
||||
|
||||
191
test/cases/global/core/chat/utils.test.ts
Normal file
@@ -0,0 +1,191 @@
|
||||
import { describe, expect, it } from 'vitest';
|
||||
import { ChatRoleEnum } from '@fastgpt/global/core/chat/constants';
|
||||
import { FlowNodeTypeEnum } from '@fastgpt/global/core/workflow/node/constant';
|
||||
import { ChatHistoryItemResType, ChatItemType } from '@fastgpt/global/core/chat/type';
|
||||
import {
|
||||
transformPreviewHistories,
|
||||
addStatisticalDataToHistoryItem
|
||||
} from '@/global/core/chat/utils';
|
||||
|
||||
describe('transformPreviewHistories', () => {
|
||||
it('should transform histories correctly with responseDetail=true', () => {
|
||||
const histories: ChatItemType[] = [
|
||||
{
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test response',
|
||||
responseData: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 1.5
|
||||
}
|
||||
]
|
||||
}
|
||||
];
|
||||
|
||||
const result = transformPreviewHistories(histories, true);
|
||||
|
||||
expect(result[0]).toEqual({
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test response',
|
||||
responseData: undefined,
|
||||
llmModuleAccount: 1,
|
||||
totalQuoteList: [],
|
||||
totalRunningTime: 1.5,
|
||||
historyPreviewLength: undefined
|
||||
});
|
||||
});
|
||||
|
||||
it('should transform histories correctly with responseDetail=false', () => {
|
||||
const histories: ChatItemType[] = [
|
||||
{
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test response',
|
||||
responseData: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 1.5
|
||||
}
|
||||
]
|
||||
}
|
||||
];
|
||||
|
||||
const result = transformPreviewHistories(histories, false);
|
||||
|
||||
expect(result[0]).toEqual({
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test response',
|
||||
responseData: undefined,
|
||||
llmModuleAccount: 1,
|
||||
totalQuoteList: undefined,
|
||||
totalRunningTime: 1.5,
|
||||
historyPreviewLength: undefined
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
describe('addStatisticalDataToHistoryItem', () => {
|
||||
it('should return original item if obj is not AI', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.Human,
|
||||
value: 'test'
|
||||
};
|
||||
|
||||
expect(addStatisticalDataToHistoryItem(item)).toBe(item);
|
||||
});
|
||||
|
||||
it('should return original item if totalQuoteList is already defined', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test',
|
||||
totalQuoteList: []
|
||||
};
|
||||
|
||||
expect(addStatisticalDataToHistoryItem(item)).toBe(item);
|
||||
});
|
||||
|
||||
it('should return original item if responseData is undefined', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test'
|
||||
};
|
||||
|
||||
expect(addStatisticalDataToHistoryItem(item)).toBe(item);
|
||||
});
|
||||
|
||||
it('should calculate statistics correctly', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test',
|
||||
responseData: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 1.5,
|
||||
historyPreview: ['preview1']
|
||||
},
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.datasetSearchNode,
|
||||
quoteList: [{ id: '1', q: 'test', a: 'answer' }],
|
||||
runningTime: 0.5
|
||||
},
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.tools,
|
||||
runningTime: 1,
|
||||
toolDetail: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 0.5
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
const result = addStatisticalDataToHistoryItem(item);
|
||||
|
||||
expect(result).toEqual({
|
||||
...item,
|
||||
llmModuleAccount: 3,
|
||||
totalQuoteList: [{ id: '1', q: 'test', a: 'answer' }],
|
||||
totalRunningTime: 3,
|
||||
historyPreviewLength: 1
|
||||
});
|
||||
});
|
||||
|
||||
it('should handle empty arrays and undefined values', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test',
|
||||
responseData: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 0
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
const result = addStatisticalDataToHistoryItem(item);
|
||||
|
||||
expect(result).toEqual({
|
||||
...item,
|
||||
llmModuleAccount: 1,
|
||||
totalQuoteList: [],
|
||||
totalRunningTime: 0,
|
||||
historyPreviewLength: undefined
|
||||
});
|
||||
});
|
||||
|
||||
it('should handle nested plugin and loop details', () => {
|
||||
const item: ChatItemType = {
|
||||
obj: ChatRoleEnum.AI,
|
||||
value: 'test',
|
||||
responseData: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 1,
|
||||
pluginDetail: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.chatNode,
|
||||
runningTime: 0.5
|
||||
}
|
||||
],
|
||||
loopDetail: [
|
||||
{
|
||||
moduleType: FlowNodeTypeEnum.tools,
|
||||
runningTime: 0.3
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
};
|
||||
|
||||
const result = addStatisticalDataToHistoryItem(item);
|
||||
|
||||
expect(result).toEqual({
|
||||
...item,
|
||||
llmModuleAccount: 3,
|
||||
totalQuoteList: [],
|
||||
totalRunningTime: 1,
|
||||
historyPreviewLength: undefined
|
||||
});
|
||||
});
|
||||
});
|
||||
59
test/cases/service/support/wallet/usage/utils.test.ts
Normal file
@@ -0,0 +1,59 @@
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { authType2UsageSource } from '@/service/support/wallet/usage/utils';
|
||||
import { AuthUserTypeEnum } from '@fastgpt/global/support/permission/constant';
|
||||
import { UsageSourceEnum } from '@fastgpt/global/support/wallet/usage/constants';
|
||||
|
||||
describe('authType2UsageSource', () => {
|
||||
it('should return source if provided', () => {
|
||||
const result = authType2UsageSource({
|
||||
authType: AuthUserTypeEnum.apikey,
|
||||
shareId: 'share123',
|
||||
source: UsageSourceEnum.api
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.api);
|
||||
});
|
||||
|
||||
it('should return shareLink if shareId is provided', () => {
|
||||
const result = authType2UsageSource({
|
||||
authType: AuthUserTypeEnum.apikey,
|
||||
shareId: 'share123'
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.shareLink);
|
||||
});
|
||||
|
||||
it('should return api if authType is apikey', () => {
|
||||
const result = authType2UsageSource({
|
||||
authType: AuthUserTypeEnum.apikey
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.api);
|
||||
});
|
||||
|
||||
it('should return fastgpt as default', () => {
|
||||
const result = authType2UsageSource({});
|
||||
expect(result).toBe(UsageSourceEnum.fastgpt);
|
||||
});
|
||||
|
||||
it('should return fastgpt for non-apikey authType', () => {
|
||||
const result = authType2UsageSource({
|
||||
authType: AuthUserTypeEnum.owner
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.fastgpt);
|
||||
});
|
||||
|
||||
it('should prioritize source over shareId and authType', () => {
|
||||
const result = authType2UsageSource({
|
||||
source: UsageSourceEnum.api,
|
||||
shareId: 'share123',
|
||||
authType: AuthUserTypeEnum.apikey
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.api);
|
||||
});
|
||||
|
||||
it('should prioritize shareId over authType', () => {
|
||||
const result = authType2UsageSource({
|
||||
shareId: 'share123',
|
||||
authType: AuthUserTypeEnum.apikey
|
||||
});
|
||||
expect(result).toBe(UsageSourceEnum.shareLink);
|
||||
});
|
||||
});
|
||||
237
test/cases/web/core/workflow/utils.test.ts
Normal file
@@ -0,0 +1,237 @@
|
||||
import { vi, describe, it, expect } from 'vitest';
|
||||
import type { FlowNodeTemplateType } from '@fastgpt/global/core/workflow/type/node';
|
||||
import type { StoreNodeItemType } from '@fastgpt/global/core/workflow/type/node';
|
||||
import type { Node, Edge } from 'reactflow';
|
||||
import {
|
||||
FlowNodeTypeEnum,
|
||||
FlowNodeInputTypeEnum,
|
||||
FlowNodeOutputTypeEnum,
|
||||
EDGE_TYPE
|
||||
} from '@fastgpt/global/core/workflow/node/constant';
|
||||
import { WorkflowIOValueTypeEnum } from '@fastgpt/global/core/workflow/constants';
|
||||
import { NodeInputKeyEnum, NodeOutputKeyEnum } from '@fastgpt/global/core/workflow/constants';
|
||||
import {
|
||||
nodeTemplate2FlowNode,
|
||||
storeNode2FlowNode,
|
||||
storeEdgesRenderEdge,
|
||||
computedNodeInputReference,
|
||||
getRefData,
|
||||
filterWorkflowNodeOutputsByType,
|
||||
checkWorkflowNodeAndConnection,
|
||||
getLatestNodeTemplate
|
||||
} from '@/web/core/workflow/utils';
|
||||
|
||||
describe('workflow utils', () => {
|
||||
describe('nodeTemplate2FlowNode', () => {
|
||||
it('should convert template to flow node', () => {
|
||||
const template: FlowNodeTemplateType = {
|
||||
name: 'Test Node',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [],
|
||||
outputs: []
|
||||
};
|
||||
|
||||
const result = nodeTemplate2FlowNode({
|
||||
template,
|
||||
position: { x: 100, y: 100 },
|
||||
selected: true,
|
||||
parentNodeId: 'parent1',
|
||||
t: (key) => key
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
type: FlowNodeTypeEnum.userInput,
|
||||
position: { x: 100, y: 100 },
|
||||
selected: true,
|
||||
data: {
|
||||
name: 'Test Node',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
parentNodeId: 'parent1'
|
||||
}
|
||||
});
|
||||
expect(result.id).toBeDefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('storeNode2FlowNode', () => {
|
||||
it('should convert store node to flow node', () => {
|
||||
const storeNode: StoreNodeItemType = {
|
||||
nodeId: 'node1',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
position: { x: 100, y: 100 },
|
||||
inputs: [],
|
||||
outputs: [],
|
||||
name: 'Test Node',
|
||||
version: '1.0'
|
||||
};
|
||||
|
||||
const result = storeNode2FlowNode({
|
||||
item: storeNode,
|
||||
selected: true,
|
||||
t: (key) => key
|
||||
});
|
||||
|
||||
expect(result).toMatchObject({
|
||||
id: 'node1',
|
||||
type: FlowNodeTypeEnum.userInput,
|
||||
position: { x: 100, y: 100 },
|
||||
selected: true
|
||||
});
|
||||
});
|
||||
|
||||
it('should handle dynamic inputs and outputs', () => {
|
||||
const storeNode: StoreNodeItemType = {
|
||||
nodeId: 'node1',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
position: { x: 0, y: 0 },
|
||||
inputs: [
|
||||
{
|
||||
key: 'dynamicInput',
|
||||
renderTypeList: [FlowNodeInputTypeEnum.addInputParam]
|
||||
}
|
||||
],
|
||||
outputs: [
|
||||
{
|
||||
key: 'dynamicOutput',
|
||||
type: FlowNodeOutputTypeEnum.dynamic
|
||||
}
|
||||
],
|
||||
name: 'Test Node',
|
||||
version: '1.0'
|
||||
};
|
||||
|
||||
const result = storeNode2FlowNode({
|
||||
item: storeNode,
|
||||
t: (key) => key
|
||||
});
|
||||
|
||||
expect(result.data.inputs).toHaveLength(1);
|
||||
expect(result.data.outputs).toHaveLength(1);
|
||||
});
|
||||
});
|
||||
|
||||
describe('filterWorkflowNodeOutputsByType', () => {
|
||||
it('should filter outputs by type', () => {
|
||||
const outputs = [
|
||||
{ id: '1', valueType: WorkflowIOValueTypeEnum.string },
|
||||
{ id: '2', valueType: WorkflowIOValueTypeEnum.number },
|
||||
{ id: '3', valueType: WorkflowIOValueTypeEnum.boolean }
|
||||
];
|
||||
|
||||
const result = filterWorkflowNodeOutputsByType(outputs, WorkflowIOValueTypeEnum.string);
|
||||
|
||||
expect(result).toHaveLength(1);
|
||||
expect(result[0].id).toBe('1');
|
||||
});
|
||||
|
||||
it('should return all outputs for any type', () => {
|
||||
const outputs = [
|
||||
{ id: '1', valueType: WorkflowIOValueTypeEnum.string },
|
||||
{ id: '2', valueType: WorkflowIOValueTypeEnum.number }
|
||||
];
|
||||
|
||||
const result = filterWorkflowNodeOutputsByType(outputs, WorkflowIOValueTypeEnum.any);
|
||||
|
||||
expect(result).toHaveLength(2);
|
||||
});
|
||||
|
||||
it('should handle array types correctly', () => {
|
||||
const outputs = [
|
||||
{ id: '1', valueType: WorkflowIOValueTypeEnum.string },
|
||||
{ id: '2', valueType: WorkflowIOValueTypeEnum.arrayString }
|
||||
];
|
||||
|
||||
const result = filterWorkflowNodeOutputsByType(outputs, WorkflowIOValueTypeEnum.arrayString);
|
||||
expect(result).toHaveLength(2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('checkWorkflowNodeAndConnection', () => {
|
||||
it('should validate nodes and connections', () => {
|
||||
const nodes: Node[] = [
|
||||
{
|
||||
id: 'node1',
|
||||
type: FlowNodeTypeEnum.userInput,
|
||||
data: {
|
||||
nodeId: 'node1',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [
|
||||
{
|
||||
key: NodeInputKeyEnum.userInput,
|
||||
required: true,
|
||||
value: undefined,
|
||||
renderTypeList: [FlowNodeInputTypeEnum.input]
|
||||
}
|
||||
],
|
||||
outputs: []
|
||||
},
|
||||
position: { x: 0, y: 0 }
|
||||
}
|
||||
];
|
||||
|
||||
const edges: Edge[] = [
|
||||
{
|
||||
id: 'edge1',
|
||||
source: 'node1',
|
||||
target: 'node2',
|
||||
type: EDGE_TYPE
|
||||
}
|
||||
];
|
||||
|
||||
const result = checkWorkflowNodeAndConnection({ nodes, edges });
|
||||
expect(result).toEqual(['node1']);
|
||||
});
|
||||
|
||||
it('should handle empty nodes and edges', () => {
|
||||
const result = checkWorkflowNodeAndConnection({ nodes: [], edges: [] });
|
||||
expect(result).toBeUndefined();
|
||||
});
|
||||
});
|
||||
|
||||
describe('getLatestNodeTemplate', () => {
|
||||
it('should update node to latest template version', () => {
|
||||
const node = {
|
||||
nodeId: 'node1',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [{ key: 'input1', value: 'test' }],
|
||||
outputs: [{ key: 'output1', value: 'test' }],
|
||||
name: 'Old Name',
|
||||
intro: 'Old Intro'
|
||||
};
|
||||
|
||||
const template = {
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [{ key: 'input1' }, { key: 'input2' }],
|
||||
outputs: [{ key: 'output1' }, { key: 'output2' }]
|
||||
};
|
||||
|
||||
const result = getLatestNodeTemplate(node, template);
|
||||
|
||||
expect(result.inputs).toHaveLength(2);
|
||||
expect(result.outputs).toHaveLength(2);
|
||||
expect(result.name).toBe('Old Name');
|
||||
});
|
||||
|
||||
it('should preserve existing values when updating template', () => {
|
||||
const node = {
|
||||
nodeId: 'node1',
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [{ key: 'input1', value: 'existingValue' }],
|
||||
outputs: [{ key: 'output1', value: 'existingOutput' }],
|
||||
name: 'Node Name',
|
||||
intro: 'Node Intro'
|
||||
};
|
||||
|
||||
const template = {
|
||||
flowNodeType: FlowNodeTypeEnum.userInput,
|
||||
inputs: [{ key: 'input1', value: 'newValue' }],
|
||||
outputs: [{ key: 'output1', value: 'newOutput' }]
|
||||
};
|
||||
|
||||
const result = getLatestNodeTemplate(node, template);
|
||||
|
||||
expect(result.inputs[0].value).toBe('existingValue');
|
||||
expect(result.outputs[0].value).toBe('existingOutput');
|
||||
});
|
||||
});
|
||||
});
|
||||