搜索 - 腾讯云开发者社区-腾讯云

文章/答案/技术大牛

发布

来自专栏流川疯编写程序的艺术
做项目一定用得到的NLP资源【分类版】
单文档非监督的关键词抽取 github DocSearch免费文档搜索引擎 github fdfgen 能够自动创建pdf文档，并填写信息 link pdfx 自动抽取出引用参考文献，并下载对应的pdf文件 link invoice2data 发票pdf信息抽取 invoice2data pdf文档信息抽取 github PDFMiner PDFMiner能获取页面中文本的准确位置，以及字体或行等其他信息。
2.7K40编辑于 2022-09-20
来自专栏Python七号
NLP 民工的乐园
tabula-py[329]: 直接将pdf中的表格信息转换为pandas的dataframe，有java和python两种版本代码 pdfx[330]: 自动抽取出引用参考文献，并下载对应的pdf文件 invoice2data tabula-py: https://github.com/chezou/tabula-py [330] pdfx: https://github.com/metachris/pdfx [331] invoice2data : https://github.com/invoice-x/invoice2data [332] camelot: https://github.com/atlanhq/camelot [333] pdfplumber
1.8K30发布于 2021-10-08