首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Java pdf到Excel的转换

Java pdf到Excel的转换
EN

Stack Overflow用户
提问于 2016-10-26 13:50:44
回答 1查看 3.5K关注 0票数 1

我正在从PDF中提取数据到excel。在PDF中还包含表。在apache poi的帮助下,我使用Itext- pdf将PDF转换为文本&将文本转换为excel。但是我无法检索要存储在数据库中的数据。我尝试了PDF-BOXASPOSE也得到了同样的结果。如果有人知道,请帮我解决这个问题。

以下是我的代码

//使用itext将pdf转换为文本

代码语言:javascript
复制
            PdfReader reader = new PdfReader(
                    "C:\\Users\\mohmeds\\Desktop\\BOI_SCFS banking.pdf_page_1.pdf");
            PdfReaderContentParser parser = new PdfReaderContentParser(
                    reader);
            // PrintWriter out = new PrintWriter(new FileOutputStream(txt));
            TextExtractionStrategy strategy;
            String line = null;
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {
                strategy = parser.processContent(i,
                        new SimpleTextExtractionStrategy());
                line = strategy.getResultantText();
            }
            reader.close();

            // using apache poi text to excel converter

            org.apache.poi.ss.usermodel.Workbook wb = new HSSFWorkbook();
            CreationHelper helper = wb.getCreationHelper();
            Sheet sheet = wb.createSheet("new sheet");
            System.out.println("link------->" + line);
            List<String> lines = IOUtils.readLines(new StringReader(line));

            for (int i = 0; i < lines.size(); i++) {
                String str[] = lines.get(i).split(",");
                Row row = sheet.createRow((short) i);
                for (int j = 0; j < str.length; j++) {
                    row.createCell(j).setCellValue(
                            helper.createRichTextString(str[j]));

                }
            }

            FileOutputStream fileOut = new FileOutputStream(
                    "C:\\Users\\mohmeds\\Desktop\\someName1.xls");
            wb.write(fileOut);
            fileOut.close();
EN

回答 1

Stack Overflow用户

发布于 2019-03-29 15:09:12

您的问题有点模糊,但是如果您希望将PDF中的数据存储到数据库中,那么您可能希望将数据提取为CSV而不是Excel。此外,这里的代码消除了将PDF转换为文本,然后将文本转换为Excel的中间步骤。定义格式时,选择'csv':

代码语言:javascript
复制
package com.pdftables.examples;

import java.io.File;
import java.util.Arrays;
import java.util.List;

import org.apache.commons.io.FileUtils;
import org.apache.http.HttpEntity;
import org.apache.http.client.config.CookieSpecs;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.entity.mime.MultipartEntityBuilder;
import org.apache.http.entity.mime.content.FileBody;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;

public class ConvertToFile {
    private static List<String> formats = Arrays.asList(new String[] { "csv", "xml", "xlsx-single", "xlsx-multiple" });

    public static void main(String[] args) throws Exception {
        if (args.length != 3) {
            System.out.println("Command line: <API_KEY> <FORMAT> <PDF filename>");
            System.exit(1);
        }

        final String apiKey = args[0];
        final String format = args[1].toLowerCase();
        final String pdfFilename = args[2];

        if (!formats.contains(format)) {
            System.out.println("Invalid output format: \"" + format + "\"");
            System.exit(1);
        }

        // Avoid cookie warning with default cookie configuration
        RequestConfig globalConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.STANDARD).build();

        File inputFile = new File(pdfFilename);

        if (!inputFile.canRead()) {
            System.out.println("Can't read input PDF file: \"" + pdfFilename + "\"");
            System.exit(1);
        }

        try (CloseableHttpClient httpclient = HttpClients.custom().setDefaultRequestConfig(globalConfig).build()) {
            HttpPost httppost = new HttpPost("https://pdftables.com/api?format=" + format + "&key=" + apiKey);
            FileBody fileBody = new FileBody(inputFile);

            HttpEntity requestBody = MultipartEntityBuilder.create().addPart("f", fileBody).build();
            httppost.setEntity(requestBody);

            System.out.println("Sending request");

            try (CloseableHttpResponse response = httpclient.execute(httppost)) {
                if (response.getStatusLine().getStatusCode() != 200) {
                    System.out.println(response.getStatusLine());
                    System.exit(1);
                }
                HttpEntity resEntity = response.getEntity();
                if (resEntity != null) {
                    final String outputFilename = getOutputFilename(pdfFilename, format.replaceFirst("-.*$", ""));
                    System.out.println("Writing output to " + outputFilename);

                    final File outputFile = new File(outputFilename);
                    FileUtils.copyToFile(resEntity.getContent(), outputFile);
                } else {
                    System.out.println("Error: file missing from response");
                    System.exit(1);
                }
            }
        }
    }

    private static String getOutputFilename(String pdfFilename, String suffix) {
        if (pdfFilename.length() >= 5 && pdfFilename.toLowerCase().endsWith(".pdf")) {
            return pdfFilename.substring(0, pdfFilename.length() - 4) + "." + suffix;
        } else {
            return pdfFilename + "." + suffix;
        }
    }
}

https://github.com/pdftables/java-pdftables-api/blob/master/pdftables.java

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/40254643

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档