No Description

sprivacy 1644b50375 merge lfy branch 1 year ago
extracted_images 76f6e51fc3 version 0.1 1 year ago
LLMAgent.py 9365b80daa 08/08/2024 16:12:52 1 year ago
README.md db72ca3de4 08/16/2024 10:49:47 1 year ago
document_.py 6ece768b5c document_.py更新 1 year ago
extract_financial_report.py 48b9a3e9b3 merge zzh branch 1 year ago
extract_price.py 48b9a3e9b3 merge zzh branch 1 year ago
get_info.py 0152861ab0 08/08/2024 17:19:21 1 year ago
instance_locate.py 48b9a3e9b3 merge zzh branch 1 year ago
lmu.py 59d20e64a0 添加BMP图形保存,添加摘要生成、关键词检测 1 year ago
matcher.py 65241d1460 添加评审因素段落定位 1 year ago
ocr_api.py 48b9a3e9b3 merge zzh branch 1 year ago
project_loc.py db72ca3de4 08/16/2024 10:49:47 1 year ago
requirements.txt 7e37b38fec extract content with outlines 1 year ago
responser.py db72ca3de4 08/16/2024 10:49:47 1 year ago
scan_dir.py 48b9a3e9b3 merge zzh branch 1 year ago
text_extractor.py 48b9a3e9b3 merge zzh branch 1 year ago
tools.py 1644b50375 merge lfy branch 1 year ago

README.md

主要模块描述 1、tools 大纲解析模块 2、get_info PDF信息抽取模块 3、matcher 段落定位模块

4、projectloc 项目业绩的表格定位模块 5、responser 格式化模块 6、lmu 摘要生成模块 7、LLMAgent 大模型调用模块 8、document 招标文件解析模块

PDF中无边框表格内容抽取
1. camelot-py git源下载
	git clone https://www.github.com/camelot-dev/camelot
	修改pyproject.toml中 pdfminer-six = "^20231228"
	安装命令: 进入camelot目录下,pip install -e .
2. 在wsl Debian中安装 ghostscript 【模块本身】
	apt install ghostscript
3. ghostscript 下载
	pip install ghostscript==0.7.0 【模块驱动】
4. 代码修改 【CV运行时不需要设置宽高,使用默认即可】
	tables_pro = camelot.read_pdf(
                    self.file_path,
                    # flavor='stream',
                    pages=str(page_number+1),
                    # edge_tol=200,
                    # row_tol=50,
                )