问pypdf用于pdf列表
EN

Stack Overflow用户

提问于 2013-07-24 03:03:57

回答 1查看 722关注 0票数 0

我已经让pypdf在单个pdf文件上工作得很好，但我似乎无法让它在一个文件列表中工作，或者在一个for循环中为多个pdf工作，而不失败，因为字符串是不可调用的。有什么想法可以作为变通办法吗？

def getPDFContent(path):
    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
    # Iterate pages
    for i in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(i).extractText() + "\n"
    # Collapse whitespace
    content = " ".join(content.replace(u"\xa0", " ").strip().split())
    return content

#print getPDFContent(r"Z:\GIS\MasterPermits\12300983.pdf").encode("ascii", "ignore")


#find pdfs            
for root, dirs, files in os.walk(folder1):
    for file in files:
      if file.endswith(('.pdf')):
          d=os.path.join(root, file)
          print getPDFContent(d).encode("ascii", "ignore")

Traceback (most recent call last):
  File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 50, in <module>
    print getPDFContent(d).encode("ascii", "ignore")
  File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 32, in getPDFContent
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
TypeError: 'str' object is not callable

我正在使用一个列表，但我得到了完全相同的错误，我不认为这是一个大问题，但现在它正在成为一个大问题。我知道我能够在arcpy中解决类似的问题，但这还远远不够。

python

pypdf

pdftotext

回答 1

Stack Overflow用户

发布于 2013-07-24 03:15:10

尽量不要在变量名中使用内置类型：

不要这样做：

for file in files:

改为执行以下操作：

 for myfile in files:

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/17818892

复制

相似问题

问pypdf用于pdf列表
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pypdf用于pdf列表EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pypdf用于pdf列表
EN