我有一个Python脚本,它使用PyPDF2来颠倒PDF的页面顺序。
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
rpage = []
name = input("What's the file called?")
filename = name.split('.', 1)
input1 = PdfFileReader(open(name,'rb'), strict = False)
pages = list(range(1,input1.getNumPages() + 1))
for i in range(0, (input1.getNumPages())):
rpage.append(pages[input1.getNumPages() - i -1])
for i in rpage:
output.addPage(input1.getPage(i-1))
outputpath = filename[0] + '-reversed.pdf'
outputStream = open(outputpath, "wb")
output.write(outputStream)在尝试写入输出流之前,它会按预期运行,并返回以下错误:
PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\reverse pdf.py", line 22, in <module>
output.write(outputStream)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 611, in readFromStream
data["__streamdata__"] = stream.read(length)
TypeError: integer argument expected, got 'NullObject'代码确实创建了一个PDF文件,但它的大小为0KB,因此不可读。我测试了一个示例脚本来合并三个PDF,发现here会生成另一个空文件,并导致以下错误:
PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573]
Traceback (most recent call last):
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1567, in _getObjectFromStream
obj = readObject(streamData, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 98, in readObject
return NumberObject.readFromStream(stream)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 269, in readFromStream
num = utils.readUntilRegex(stream, NumberObject.NumberPattern)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\utils.py", line 134, in readUntilRegex
raise PdfStreamError("Stream has ended unexpectedly")
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\untitled1.py", line 27, in <module>
merger.write(output)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\merger.py", line 230, in write
self.output.write(fileobj)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write
self._sweepIndirectReferences(externalReferenceMap, self._root)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, data[i])
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences
self._sweepIndirectReferences(externMap, realdata)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences
value = self._sweepIndirectReferences(externMap, value)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences
newobj = data.pdf.getObject(data)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject
retval = readObject(self.stream, self)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject
return DictionaryObject.readFromStream(stream, pdf)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 609, in readFromStream
length = pdf.getObject(length)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1593, in getObject
retval = self._getObjectFromStream(indirectReference)
File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1576, in _getObjectFromStream
raise utils.PdfReadError("Can't read object stream: %s"%e)
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly当使用此脚本将PDF拆分为其组成页面时,也会输出前面的错误:
from PyPDF2 import PdfFileWriter, PdfFileReader
infile = PdfFileReader(open('test.pdf', 'rb'))
for i in range(infile.getNumPages()):
p = infile.getPage(i)
outfile = PdfFileWriter()
outfile.addPage(p)
with open('page-%02d.pdf' % i, 'wb') as f:
outfile.write(f)上面的代码产生了(n-1)个可读的PDF,但第n个PDF是一个空文件。你知道我怎么解决这个问题吗?
发布于 2017-03-09 22:54:56
你的脚本遍历了几个不同地方的页面,我不清楚这些地方的目的。我相信你倒数的方式是你错误的根源。
我将您的脚本改编为2.7版(因为这就是我正在运行的版本),然后将其简化为向后遍历您的源文件一次,创建您的反向文件。
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
# rpage = [] removed because it's not needed anymore
name = raw_input("What's the file called? ") #Changed for the 2.7 environment
filename = name[:-4] #Simplified, since we know where the piece we want is.
input1 = PdfFileReader(name,"rb")
#Simplified, because I couldn't figure out why it was complex.
for i in range(input1.getNumPages(),0,-1):
#getNumPages counts like a human and gives the total number of pages
#This counts backwards, so no need to count forward and use that to
#reverse the numbers.
output.addPage(input1.getPage(i-1))
#getPage counts like a computer and needs to finish with page 0.
outputpath = filename + '-reversed.pdf'
outputStream = open(outputpath, "wb")
output.write(outputStream)
outputStream.close() #Closes the file and stream once you're done.发布于 2017-03-31 05:21:34
如果您想要的是能够反转页面以进行打印,而您并不关心尝试保留内部链接和注释,那么pdfrw可能比pyPDF2更适合这项任务:
from pdfrw import PdfWriter, PdfReader
iname = input("What's the file called? ")
oname = iname.rsplit('.', 1)[0] + '-reversed.pdf'
output = PdfWriter()
output.addpages(reversed(PdfReader(iname).pages))
output.write(oname)免责声明:我是pdfrw的主要作者。
发布于 2018-07-19 22:44:40
我建议你使用PyPDF2的'merge‘功能,而不是'addPage’。
以下代码片段详细说明了如何附加和合并文件/页面:
from PyPDF2 import PdfFileMerger
merger = PdfFileMerger()
input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")
# add the first 3 pages of first file to output
merger.append(fileobj = input1, pages = (0,3))
# insert the first page of second file into the output beginning after the second page
merger.merge(position = 2, fileobj = input2, pages = (0,1))
# Write to an output PDF document
output = open("document-output.pdf", "wb")
merger.write(output)删除“append”和“merge”函数中的“pages”参数,以合并文件而不是特定页面。
https://stackoverflow.com/questions/42570432
复制相似问题