文章/答案/技术大牛

发布

社区首页 >问答首页 >如何编辑pdf文件，替换其数据？

问如何编辑pdf文件，替换其数据？
EN

Stack Overflow用户

提问于 2015-02-23 17:30:46

回答 1查看 5.3K关注 0票数 1

我试图在一个pdf文件中旋转页面，然后用同一个pdf文件中的旋转页面替换旧页面。

我编写了以下代码：

#!/usr/bin/python

import os
from pyPdf import PdfFileReader, PdfFileWriter

my_path = "/home/USER/Desktop/files/"

input_file_name = os.path.join(my_path, "myfile.pdf")
input_file = PdfFileReader(file(input_file_name, "rb"))
input_file.decrypt("MyPassword")
output_PDF = PdfFileWriter()

for num_page in range(0, input_file.getNumPages()):
    page = input_file.getPage(num_page)
    page.rotateClockwise(270)
    output_PDF.addPage(page)

#Trying to replace old data with new data in the original file, not
#create a new file and add the new data!
output_file_name = os.path.join(my_path, "myfile.pdf")
output_file = file(output_file_name, "wb")
output_PDF.write(output_file)
output_file.close()

上面的代码给了我一个错误！我甚至试过用：

input_file = PdfFileReader(file(input_file_name, "r+b"))

但也没用..。

改变线路：

output_file_name = os.path.join(my_path, "myfile.pdf")

通过以下方式：

output_file_name = os.path.join(my_path, "myfile2.pdf")

修复一切但这不是我想要的..。

有什么帮助吗？

错误代码：

回溯(最近一次调用)：文件"12-5.py"，第22行，在output_PDF.write(output_file) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py“中，第264行，在写self._root文件"/usr/lib/pymodules/python2.7/pyPdf/pdf.py”中，第339行，在_sweepIndirectReferences self._sweepIndirectReferences(externMap，( _sweepIndirectReferences value = self._sweepIndirectReferences(externMap，value) File“/usr/lib/pymodules/python2.7 2.7/pyPdf/pdf.py”，第339行，_sweepIndirectReferences self._sweepIndirectReferences(externMap，realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第315行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，( _sweepIndirectReferences value = self._sweepIndirectReferences(externMap，datai) File“/usr/lib/pymodules/python2.7 2.7/pyPdf/pdf.py”，第339行，_sweepIndirectReferences self._sweepIndirectReferences(externMap，realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py"，第315行，在_sweepIndirectReferences值= self._sweepIndirectReferences(externMap，( _sweepIndirectReferences value = self._sweepIndirectReferences(externMap，datai) File“/usr/lib/pymodules/python2.7 2.7/pyPdf/pdf.py”，第345行，在_sweepIndirectReferences newobj = data.pdf.getObject(data) File“/usr/lib/py模块/python2.7/pyPdf/pdf.py”中，第649行，getObject retval =readObject=readObject，自定义文件"/usr/lib/pymodules/python2.7/pyPdf/generic.py"，第67行，在readObject返回DictionaryObject.readFromStream( find，pdf)文件"/usr/lib/pymodules/python2.7/pyPdf/generic.py"，行564，在readFromStream find utils.PdfReadError中，“无法找到'endstream‘一个接一个的标记。”pyPdf.utils.PdfReadError:无法找到一个又一个的“结束流”标记。

python

pdf

edit

pypdf

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-02-23 18:26:16

我怀疑，问题是PyPDF正在从文件中读取写入文件的内容。

正确的修复--正如您已经注意到的--是写入一个单独的文件，然后用新文件替换原始文件。就像这样：

output_file_name = os.path.join(my_path, "myfile-temporary.pdf")
output_file = file(output_file_name, "wb")
output_PDF.write(output_file)
output_file.close()
os.rename(output_file_name, input_file_name)

我编写了一些代码，它简化了以下内容：.py#L14

from unstdlib.standard.contextlib_ import open_atomic

with open_atomic(input_file_name, "wb") as output_file:
    output_PDF.write(output_file)

这将自动创建一个临时文件，写入它，然后替换原始文件。

编辑：我最初误解了这个问题。以下是我的不正确的，但可能有助于其他人的回答。

您的代码很好，并且应该在“大多数”PDF上不存在任何问题。

您所看到的问题是PyPDF与您试图使用的特定PDF之间的不兼容性。这可能是PyPDF中的一个bug，也可能是PDF并不完全有效。

你可以尝试两件事：

查看PyPDF2是否可以读取该文件。用pip install PyPDF2安装pip install PyPDF2，用import PyPDF2 …替换import pyPdf …，然后重新运行脚本。
使用另一个程序重新编码您的PDF，看看这是否有效。例如，使用类似于convert bad.pdf bad.ps; convert bad.ps maybe-good.pdf的东西可能会修复一些问题。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/28679706

复制

相似问题

问如何编辑pdf文件，替换其数据？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何编辑pdf文件，替换其数据？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何编辑pdf文件，替换其数据？
EN