文章/答案/技术大牛

发布

社区首页 >问答首页 >在python中提取PDF中的所有表

问在python中提取PDF中的所有表
EN

Stack Overflow用户

提问于 2018-09-07 17:06:55

回答 2查看 2.4K关注 0票数 3

我有一个PDF，并想从该PDF中提取所有表格。当我运行下面的代码时，我得到了一个空列表。

import pdftables

filepath = 'File_Set_-2_feasibility_Study/140u-td005_-en-p.pdf'
with open(filepath, 'rb') as fh:
    table = pdftables.get_tables(fh)
print(table)

python

pdf

pdftables

回答 2

Stack Overflow用户

发布于 2018-09-07 17:14:21

我假设PDF有多个页面？这应该是可行的：

from pdftables.pdf_document import PDFDocument
from pdftables.pdftables import page_to_tables

filepath = ...
page_number = ...
with open(filepath, 'rb') as file_object:
    pdf_doc = PDFDocument.from_fileobj(file_object)
    pdf_page = pdf_doc.get_page(pagenumber) 
    tables = page_to_tables(pdf_page)
    print(tables)

您也可以遍历多个页面：

for page_number, page in enumerate(pdf_doc.get_pages()):
    tables = page_to_tables(page)
    print(tables)

票数 2

Stack Overflow用户

发布于 2021-05-18 17:31:21

#安装下面的库来使用pdf表，它对我很有效

> pip install pdftables.six

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52219133

复制

相似问题

问在python中提取PDF中的所有表
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中提取PDF中的所有表EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在python中提取PDF中的所有表
EN