文章/答案/技术大牛

发布

社区首页 >问答首页 >在PDF中搜索字符串并获取它们在页面上的位置

问在PDF中搜索字符串并获取它们在页面上的位置
EN

Stack Overflow用户

提问于 2018-06-12 00:17:56

回答 1查看 1.1K关注 0票数 0

我想将nameddests添加到由某个字符串指定的现有PDF的位置(例如:将nameddest放在字符串“第1章”的第一次出现处)。然后，我希望能够使用JS事件跳转到这些命名的events。

到目前为止，我使用PHP和FPDF/ FPDI实现了什么:我可以使用FPDI加载现有的PDF，并使用略微修改的1版本将命名的embed添加到任意位置。然后，我可以将PDF嵌入到iframe中，并使用例如JS按钮导航到命名的embed。

然而，到目前为止，我需要手工找出命名目标的位置。如何在PDF中搜索字符串，并获得搜索结果的页码和位置，以便在其中添加命名字符串？

php

pdf

fpdf

回答 1

Stack Overflow用户

发布于 2018-06-13 21:51:29

使用FPDI分析PDF文档的内容是不可能的。

我们(Setasign - FPDI和PDF_NamedDestinations的作者)有一个产品(不是免费的)，它允许你处理这个任务：SetaPDF-Extractor组件。

您的项目的简单POC可能如下所示：

<?php
// load and register the autoload function
require_once('library/SetaPDF/Autoload.php');

$writer = new SetaPDF_Core_Writer_Http('result.pdf', true);
$document = SetaPDF_Core_Document::loadByFilename('file/with/chapters.pdf', $writer);

$extractor = new SetaPDF_Extractor($document);

// define the word strategy
$strategy = new SetaPDF_Extractor_Strategy_Word();
$extractor->setStrategy($strategy);

// get the pages helper
$pages = $document->getCatalog()->getPages();

// get access to the named destination tree
$names = $document
    ->getCatalog()
    ->getNames()
    ->getTree(SetaPDF_Core_Document_Catalog_Names::DESTS, true);

for ($pageNo = 1; $pageNo <= $pages->count(); $pageNo++) {
    /**
     * @var SetaPDF_Extractor_Result_Word[] $words
     */
    $words = $extractor->getResultByPageNumber($pageNo);

    // iterate over all found words and search for "Chapter" followed by a numeric string...
    foreach ($words AS $word) {
        $string = $word->getString();
        if ($string === 'Chapter') {
            $chapter = $word;
            continue;
        }

        if (null === $chapter) {
            continue;
        }

        // is the next word a numeric string
        if (is_numeric($word->getString())) {
            // get the coordinates of the word
            $bounds = $word->getBounds()[0];
            // create a destination
            $destination = SetaPDF_Core_Document_Destination::createByPageNo(
                $document,
                $pageNo,
                SetaPDF_Core_Document_Destination::FIT_MODE_FIT_BH,
                $bounds->getUl()->getY()
            );

            // create a name (shall be unique)
            $name = strtolower($chapter . $word->getString());
            try {
                // add the named destination to the name tree
                $names->add($name, $destination->getPdfValue());
            } catch (SetaPDF_Core_DataStructure_Tree_KeyAlreadyExistsException $e) {
                // handle this exception
            }
        }

        $chapter = null;
    }
}

// save and finish the resulting document
$document->save()->finish();

然后，您可以通过URL以这种方式访问指定的目的地(查看器应用程序和浏览器插件需要支持此操作)：

http://www.example.com/script.php#chapter1
http://www.example.com/script.php#chapter2
http://www.example.com/script.php#chapter10
...

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50802080

复制

相似问题

问在PDF中搜索字符串并获取它们在页面上的位置
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在PDF中搜索字符串并获取它们在页面上的位置EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在PDF中搜索字符串并获取它们在页面上的位置
EN