文章/答案/技术大牛

发布

社区首页 >问答首页 >Python PDFMiner库中不存在PDFPage

问Python PDFMiner库中不存在PDFPage
EN

Stack Overflow用户

提问于 2017-06-20 02:10:21

回答 2查看 5.2K关注 0票数 1

所以我通过pip为python3.6安装了pdfminer3k。我尝试了一些打开和转换PDF文件为文本的例子，它们都需要PDFPage导入。这对我来说是不存在的。有什么办法可以解决这个问题吗？我尝试从online复制一个PDFPage.py并保存到python搜索pdfminer的目录中，但我得到了……“导入错误:无法导入名称PDFObjectNotFound”。

谢谢!

python

importerror

pdfminer

pdfpage

回答 2

Stack Overflow用户

发布于 2017-06-20 02:41:12

阿。我猜PDFPage不是针对Python3.6的。下面来自How to read pdf file using pdfminer3k?的例子解决了我的问题！

票数 1

Stack Overflow用户

发布于 2020-11-24 17:40:18

import io
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfparser import PDFParser, PDFDocument

def extract_text_from_pdf(pdf_path):
    '''
    Iterator: extract the plain text from pdf-files with pdfminer3k

    pdf_path: path to pdf-file to be extracted
    return: iterator of string of extracted text (by page)
    '''
    # pdfminer.six-version can be found at:
    # https://www.blog.pythonlibrary.org/2018/05/03/exporting-data-from-pdfs-with-python/
    with open(pdf_path, 'rb') as fp:
        parser = PDFParser(fp)
        doc = PDFDocument()
        parser.set_document(doc)
        doc.set_parser(parser)
        doc.initialize('')
        for page in doc.get_pages(): # pdfminer.six: PDFPage.get_pages(fh, caching=True, check_extractable=True):
            rsrcmgr = PDFResourceManager()
            fake_file_handle = io.StringIO()
            device = TextConverter(rsrcmgr, fake_file_handle, laparams=LAParams())
            interpreter = PDFPageInterpreter(rsrcmgr, device)
            interpreter.process_page(page)

            text = fake_file_handle.getvalue()
            yield text

            # close open handles
            device.close()
            fake_file_handle.close()

maxPages = 1
for i, t in enumerate(extract_text_from_pdf(fPath)):
    if i<maxPages:
        print(f"Page {i}:\n{t}")
    else:
        print(f"Page {i} skipped!")

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/44637231

复制

相似问题

问Python PDFMiner库中不存在PDFPage
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python PDFMiner库中不存在PDFPageEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python PDFMiner库中不存在PDFPage
EN