文章/答案/技术大牛

发布

社区首页 >问答首页 >如何从PDF中读取条件文本？

问如何从PDF中读取条件文本？
EN

Stack Overflow用户

提问于 2019-08-25 14:34:41

回答 1查看 115关注 0票数 0

我想阅读PDF中的特定部分。这怎么可能呢？比如:如果你访问URl:假设我只想获取第一部分的数据。

    URL url = new URL("https://www.uscis.gov/sites/default/files/files/form/i-129.pdf");

    InputStream is = url.openStream();
    BufferedInputStream fileParse = new BufferedInputStream(is);
    PDDocument document = null;
    document = PDDocument.load(fileParse);
    String pdfContent = new PDFTextStripper().getText(document);

    System.out.println(pdfContent);

text

pdfbox

java

selenium

pdf

回答 1

Stack Overflow用户

发布于 2019-08-25 16:50:05

在您的特定情况下，您可以设置剥离器的起始页和结束页，这样就不会每次都获得完整的文档，然后使用一些简单的字符串操作来获得所需的内容。

下面是一个完整的、更通用的基于您的代码的工作示例。

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.BufferedInputStream;
import java.io.InputStream;
import java.net.URL;

public class App {
    public static void main(String...args) throws Exception {
        String path = "..."; // replace with whatever path you need
        String startDelimiter = "..."; // replace with wherever the start is
        String endDelimiter = "...";
        URL url = new URL(path);
        InputStream is = url.openStream();
        BufferedInputStream fileParse = new BufferedInputStream(is);
        PDDocument document = PDDocument.load(fileParse);
        PDFTextStripper stripper = new PDFTextStripper();
        // set this stuff if you know more or less where it should be in the pdf to avoid stripping the whole thing
        stripper.setStartPage(1);
        stripper.setEndPage(3);
        // get the content
        String content = stripper.getText(document);
        String searchedContent = content.substring(content.indexOf(startDelimiter), content.indexOf(endDelimiter));
        System.out.println(searchedContent);
    }
}

另一方面，如果您不知道要在文档中查找的位置，只需做一点工作，您就可以搜索文档，以便获得起始页和结束页或其他内容。请参阅此similar question。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57643774

复制

相似问题

问如何从PDF中读取条件文本？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从PDF中读取条件文本？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从PDF中读取条件文本？
EN