文章/答案/技术大牛

发布

社区首页 >问答首页 >Ghostscrip从PDF中提取每第n页

问Ghostscrip从PDF中提取每第n页
EN

Unix & Linux用户

提问于 2019-03-20 19:20:15

回答 2查看 1.9K关注 0票数 2

我不知道能不能做到，我只是在学鬼书。假设我有多个PDF文件，每个文件大约500页长。我是否可以设置幽灵脚本，从每个文档中提取每100页，并将每个文件保存为一个单独的PDF文件？

所以我有FileA.pdf，500页长。所以我现在需要FileA_0001.pdf FileA_0002.pdf FileA_0003.pdf FileA_0004.pdf FileA_0005.pdf

我已经成功地编写了一个脚本，该脚本将根据我的间隔拆分文件并合并它们，但我在试图正确地重命名文件时遇到了困难。我遇到的问题是，在第一个文件完成拆分和合并之后，它将重命名为FileA_0001.pdf FileA_0002.pdf FileA_0003.pdf FileA_0004.pdf FileA_0005.pdf

但是问题是，一旦它开始了FileB的过程，它会不会是FileB_0006.pdf FileB_0007.pdf我尝试过几种不同的方法，但每种方法都失败了，建议呢？有人能帮忙吗？

    for file in /mnt/bridge/pdfsplit/staging/*.[pP][dD][fF]
do
  echo $file
  #Splits All the Files
  gs -q -dNOPAUSE -sDEVICE=pdfwrite -o tmp_%04d.pdf $file

  #Removes Last File in List; Ghostscript creates a blank file everytime it splits
  find /mnt/bridge/pdfsplit/ -name "tmp*"  | tail -1 | while read filename ; do rm $filename; done

    pageCount=$(find . -name "tmp*" | wc -l)
    documents=$(((pageCount / 998) + (pageCount % 998 > 0)))
    pages=$(((pageCount/documents) + (pageCount % documents > 0 )))

    for ((i=1; i<$pageCount; i++)); do
      list=$(ls -1 tmp* 2>/dev/null | head -$pages)
      count=$(ls -1 tmp* 2>/dev/null| wc -l)
      gs -q -dNOPAUSE -sDEVICE=pdfwrite -o $(basename $file .pdf )_Part_$(printf %04d $i).pdf -dBATCH $list
      rm -f $list
      if [[ $count -eq 0 ]]; then
         echo "Done"
         break
         fi
    done


   #Removes Last File in List; Ghostscript is creating a blank file
   mv *.pdf /mnt/bridge/pdfsplit/splitFiles/
   find /mnt/bridge/pdfsplit/splitFiles/ -name "*.pdf"  | tail -1 | while read filename ; do rm $filename; done

done

ghostscript

回答 2

Unix & Linux用户

回答已采纳

发布于 2019-04-05 20:43:14

这有帮助吗？

#!/bin/bash

function getChunk {
    #extract a page range
    gs -q -dNOPAUSE -sDEVICE=pdfwrite -sPageList=$1-$2 -o ${3%%.*}_$(printf %04d $4).pdf $3
}

for file in *.pdf; do

    #Use gs to get the page count
    pgs=$(gs -q -dNODISPLAY -c "($file) (r) file runpdfbegin pdfpagecount = quit")

    #specify the number of pages in each chunk as step
    step=10

    #calculate the number of whole chunks 
    chunks=$(( pgs / step))

    #reset all counters between pdfs
    f=0    #first page to extract in chunk
    l=0    #last page to extract in chunk
    i=0    #chunk counter

    #Extract the whole chunks 
    for ((i=0; i<$chunks; i+=1)); do

        #calculate the first and last pages 
        f=$((i*step+1))
        l=$((f+step-1))
        getChunk $f $l $file $i
    done

    #Pick up any part chunk at the end of the file
    f=$((l+1))
    if [ $f -le $pgs ]; then
        getChunk $f $pgs $file $i
    fi
done

我会让你整理一下命名.

票数 1

Unix & Linux用户

发布于 2019-04-05 20:54:07

Without gs

创建FileA_0001.pdf，FileA_0002.pdf，.，FileA_0100.pdf

for i in $(seq 1 100); do pdftocairo -pdf -f $i -l $i FileA.pdf $(printf 'FileA_%04d.pdf' $i); done

创建FileA_1.pdf，FileA_2.pdf，.，FileA_100.pdf

pdfseparate -l 100 FileA.pdf FileA_%d.pdf

我最喜欢的是pdftocairo。根据我的经验比gs更快更可靠。试试看。

票数 0

页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://unix.stackexchange.com/questions/507521

复制

相似问题

问Ghostscrip从PDF中提取每第n页
EN

回答 2

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ghostscrip从PDF中提取每第n页EN

回答 2

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ghostscrip从PDF中提取每第n页
EN