问只保留每一节的最后一页PDF？
EN

Ask Ubuntu用户

提问于 2019-05-03 12:04:44

回答 1查看 171关注 0票数 4

我有许多PDF文件，有重复相同的幻灯片(或其中的变化)每一节。(即:每一节都有几乎相同的同一张幻灯片)。我想减少PDF和剥离多余的副本，只留下一页每节。

这是一个PDF的示例。基本上我想让他的工作自动化。

有什么工具，比如pdftk、pdfcrop或鬼怪脚本，我可以用它们只保存PDF中每个部分的最后一页吗？命令行工具将是最好的！

编辑:上传我自己的例子。这里有一个图像展示了这个问题。看看有3页的“标签”设置为2。我们有3页有页索引2，3页有页索引3。我想保留最后一页有页索引2，最后一页有页索引3。我想这样做的所有PDF“节”，这就是所谓的Acrobat！

command-line

pdf

回答 1

Ask Ubuntu用户

回答已采纳

发布于 2019-05-03 14:02:35

我自己解决了问题。编写了python代码来处理它。检索PageLabels检索标签本身，该标签本身可能是数字的，也可能不是数字的，以及所述标签开始的相应索引。我提取标签的开始索引，并假设节或标签的结束在下一个标签/节开始之前立即发生1页。

#!/usr/bin/python

from PyPDF2 import PdfFileWriter, PdfFileReader
import numpy as np

def printf(format, *values):
    print(format % values )

with open("in.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()

    # The label indices occur @ even locations - generate array of form [0, 2, 4, 6, ...]
    indices = np.array(np.arange(0,np.shape(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[0],2))

    # Assume end of preceding label = start of next label - 1
    pageIndices = np.array(input1.trailer["/Root"]["/PageLabels"]["/Nums"])[indices] - 1 

    # ignore the first index which is now a -1
    pageIndices = pageIndices[1:] 

    # there may be extra pages right after the start of the last label - add them
    pageIndices = np.append(pageIndices, np.arange(pageIndices[-1]+1, numPages))


    for _, v in enumerate(pageIndices):
        page = input1.getPage(v)
        output.addPage(page)

    with open("out.pdf", "wb") as out_f:
        output.write(out_f)

票数 2

页面原文内容由Ask Ubuntu提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://askubuntu.com/questions/1140224

复制

相似问题

问只保留每一节的最后一页PDF？
EN

回答 1

Ask Ubuntu用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问只保留每一节的最后一页PDF？EN

回答 1

Ask Ubuntu用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问只保留每一节的最后一页PDF？
EN