文章/答案/技术大牛

发布

问iTextSharp提取cyrillic特征
EN

Stack Overflow用户

提问于 2022-06-02 21:20:55

回答 1查看 111关注 0票数 0

在我的项目中，我需要阅读PDF文档。此pdf包含乌克兰和俄罗斯字符。PDFReader读取此pdf中的所有字符，但输出中缺少圆环字符。我试着用编码，但没什么用。我能用这个字符做什么？

   public static string GetText(string filePath)
    {
        ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.LocationTextExtractionStrategy();
        StringBuilder text = new StringBuilder();
        if (File.Exists(filePath)){
            PdfReader pdfReader = new PdfReader(filePath);
            for (int i = 1; i < pdfReader.NumberOfPages; i++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                string thePage = PdfTextExtractor.GetTextFromPage(pdfReader, i, strategy);
                text.Append(System.Environment.NewLine);
                thePage = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(thePage)));
                text.Append(thePage);
            }                pdfReader.Close();
        }            return text.ToString();
    }

pdf-reader

pdf-extraction

pdf

itext

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-06-06 22:10:20

iTextSharp是一个过时的产品，不再受支持，可能在文本提取方面存在问题。下面是一个简单的示例，说明在ITEXT 7中提取文本是如何工作的(代码是用java编写的，但是对于c#，所有内容都是一样的)。

    String filePath = "test.pdf";
    StringBuilder text = new StringBuilder();
    PdfReader pdfReader = new PdfReader(filePath);
    PdfDocument pdfDocument = new PdfDocument(pdfReader);
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
        PdfPage page = pdfDocument.getPage(i);
        ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
        String thePage = PdfTextExtractor.getTextFromPage(page, strategy);
        text.append(thePage);
    }
    pdfReader.close();
    System.out.print(text);

代码与示例中的代码大致相同，但文本提取。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72482480

复制

相似问题

问iTextSharp提取cyrillic特征
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问iTextSharp提取cyrillic特征EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问iTextSharp提取cyrillic特征
EN