我正在使用这个代码
from pdfminer.layout import LAParams, LTTextBox
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
fp = open('yourpdf.pdf', 'rb')
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
pages = PDFPage.get_pages(fp)
for page in pages:
print('Processing next page...')
interpreter.process_page(page)
layout = device.get_result()
for obj in layout:
for lobj in obj:
text = lobj.get_text()
print(' text: %s' % text))我正在使用这个文档:http://www.unixuser.org/~euske/python/pdfminer/programming.html#layout
我不知道为什么我会得到TypeError: 'LTCurve' object is not iterable的任何想法?我想可能是因为我正在尝试获取的LTCurve和LTTextLine处于平行位置,该如何解决?谢谢!
发布于 2019-08-28 17:54:14
解决了!我必须使用'if isinstance(lobj,LTTextBox)‘(因为LTTextLine与LTTextBOX连接)来显示数据应该从哪里开始,然后我必须显示我需要来自LTTextLine的数据。代码:
for page in pages:
print('Processing next page...')
interpreter.process_page(page)
layout = device.get_result()
for obj in layout:
if isinstance(obj,LTTextBox):
for lobj in obj:
if isinstance(lobj,LTTextLine):
text = lobj.get_text()
print(' text: %s' % text))https://stackoverflow.com/questions/57689442
复制相似问题