blocks|key|4734451|text|Tesseract+OCR|type|header-two|depth|inlineStyleRanges|entityRanges|offset|length|data|4734452|📷|atomic|4734453|最初的引擎是在80年代后期由惠普和IBM开发的，但它已经被证明是我使用过的最好的眼科识别软件之一。它最近经历了许多更新引擎，并已成为市场上最全面的OCR工具之一。与大多数其他OCR工具相比，它可以轻松地将标准文档类型转换为文本(在文本匹配的90+%25以上)。|unstyled|4734454|以下是一个例子：|4734455|tesseract+ScannedDocument.png+out|code-block|syntax|javascript|4734456|将产生一个名为out.txt的文件|entityMap|0|LINK|mutability|MUTABLE|url|http://code.google.com/p/tesseract-ocr/|1|IMAGE|IMMUTABLE|imageUrl|https://hostmar.co/software-large|imageAlt^0|0|D|0|0|0|1|1|0|0|0|0^^$0|@$1|2|3|4|5|6|7|15|8|@]|9|@$A|16|B|17|1|18]]|C|$]]|$1|D|3|E|5|F|7|19|8|@]|9|@$A|1A|B|1B|1|1C]]|C|$]]|$1|G|3|H|5|I|7|1D|8|@]|9|@]|C|$]]|$1|J|3|K|5|I|7|1E|8|@]|9|@]|C|$]]|$1|L|3|M|5|N|7|1F|8|@]|9|@]|C|$O|P]]|$1|Q|3|R|5|I|7|1G|8|@]|9|@]|C|$]]]|S|$T|$5|U|V|W|C|$X|Y]]|Z|$5|10|V|11|C|$12|13|14|-4]]]]

<h1><a href="http://code.google.com/p/tesseract-ocr/" rel="noreferrer">Tesseract OCR</a> <a href="http://apt.ubuntu.com/p/tesseract-ocr" rel="noreferrer"><img src="https://hostmar.co/software-large" alt="Install Tesseract OCR" /></a></h1>
The original engine was developed back in the late 80's by HP and IBM but it has proven to be one of the best Ocular Recognition Softwares I've used. It's recently undergone many updates to the engine and has become one of the most comprehensive OCR tools on the market. Outscoring against most all other OCR tools (with something in the higher 90 percentile of text matches) it can easily transform standard document type-face to text.
The following is an example:
<pre><code>tesseract ScannedDocument.png out
</code></pre>
Will produce a file called out.txt

blocks|key|5923487|text|另一个应该能够做到这一点的项目是gscan2pdf。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5923488|sudo+apt-get+install+gscan2pdf|code-block|syntax|javascript|5923489|该项目还可以使用Tesseract以及其他开源OCR工具。|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|J|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|K|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|L|8|@]|9|@]|A|$]]]|I|$]]

Another project that should be able to do this is gscan2pdf

<pre><code>sudo apt-get install gscan2pdf
</code></pre>

This project can also use Tesseract, as well as other open source OCR tools.

blocks|key|4734496|text|我不知道任何OCR的Ubuntu，但对于Windows有一个有你需要的功能。那是ABBYY+FineReader+这是一页，但它不是免费的|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|entityMap|0|LINK|mutability|MUTABLE|url|http://www.abbyy.com/^0|1L|4|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@$A|L|B|M|1|N]]|C|$]]]|D|$E|$5|F|G|H|C|$I|J]]]]

I dont know any OCR for Ubuntu, but for Windows there is one that have the features you need. 
That is ABBYY FineReader <a href="http://www.abbyy.com/" rel="nofollow">this is the page</a> but it is not free

blocks|key|4734508|text|免费解决方案存在于repos、CunieForm+(和YAGF作为Gnome前端)|type|unstyled|depth|inlineStyleRanges|entityRanges|data|entityMap^0^^$0|@$1|2|3|4|5|6|7|C|8|@]|9|@]|A|$]]]|B|$]]

Free solution exists in repos, CunieForm (and YAGF as Gnome frontend for it)

blocks|key|5923435|text|似乎十足类工程会或将要导出到PDF，所以Tesseract必须以某种方式导出必要的信息，以知道在哪里找到了什么文本。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|entityMap|0|LINK|mutability|MUTABLE|url|http://sites.google.com/site/decapodproject/^0|2|5|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@$A|L|B|M|1|N]]|C|$]]]|D|$E|$5|F|G|H|C|$I|J]]]]

It seems like the <a href="http://sites.google.com/site/decapodproject/" rel="nofollow">Decapod project</a> does or will export to PDF, so Tesseract must somehow export the necessary information to know where what text was found.

blocks|key|5923556|text|Acrobat+(不是阅读器，不是免费应用程序)能够对扫描的PDF文档进行OCR，并在图像上添加一个不可见的文本层，这样就可以选择和复制文本。不幸的是，我没有方便地检查这个特性在Acrobat的UI中的确切位置，但是我已经成功地使用了它几次，与您提到的目的相同。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|5923557|是的，这是一个Windows软件，不是Linux，而是根据“葡萄酒总部”应用数据库，它在“葡萄酒”下工作。|offset|length|entityMap|0|LINK|mutability|MUTABLE|url|http://appdb.winehq.org/appview.php?appId=847^0|0|R|P|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|N|8|@]|9|@$D|O|E|P|1|Q]]|A|$]]]|F|$G|$5|H|I|J|A|$K|L]]]]

Adobe Acrobat (not reader, not a free application) is capable of OCR-ing a scanned PDF document and adding an invisible text layer on top of the image, so that the text could be selected and copied. Unfortunately I don't have it handy to check where exactly that feature is located in Acrobat's UI, but I've been successfully using it couple of times for the same purpose as you mentioned. 

And yes, this is a Windows software, not Linux one, but <a href="http://appdb.winehq.org/appview.php?appId=847" rel="nofollow">according to Wine HQ application database, it works under Wine</a>.

blocks|key|4734655|text|最好的OCR软件通常嵌入打印机/扫描器/复印机中。佳能IRC+3880在我的办公室可以输出伟大的OCR的pdfs比我知道的任何桌面程序更容易和更快。把书放在托盘上(未装订)，选择你的邮件地址，按绿色按钮。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|4734656|你可以在网上找到的OCR的大部分pdf来自类似的机器。问题是价格太高，不适合家庭使用(约12000欧元IRC)。|entityMap^0|0^^$0|@$1|2|3|4|5|6|7|E|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|F|8|@]|9|@]|A|$]]]|D|$]]

The best OCR software is usually embedded in printers/scaners/copiers. The Canon IRC 3880 in my office can output great OCR'd pdfs easier and faster than any desktop program that I know. Put the book on the tray (unbound), select your mail address, press the green button.

Most of the OCR's pdf that you can find on the net come for similar machines. The problem is that the price is too high for home usage (around 12000 euros IRC).

blocks|key|5923627|text|我最喜欢的免费在线OCR软件是由理光创新提供的。这是一个测试程序，但我发现它运行得很好。查看它在：http://beta.rii.ricoh.com/betalabs/content/document-conversion|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|entityMap|0|LINK|mutability|MUTABLE|url|http://beta.rii.ricoh.com/betalabs/content/document-conversion^0|1D|1Q|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@$A|L|B|M|1|N]]|C|$]]]|D|$E|$5|F|G|H|C|$I|J]]]]

My favorite free, online OCR software is offered by Ricoh Innovations. This is a beta program, but I find it works quite well. Check it out at: <a href="http://beta.rii.ricoh.com/betalabs/content/document-conversion" rel="nofollow">http://beta.rii.ricoh.com/betalabs/content/document-conversion</a>

blocks|key|5923760|text|OCRFeeder|type|header-two|depth|inlineStyleRanges|entityRanges|data|5923761|它是一个GUI应用程序。|unstyled|5923762|📷|atomic|offset|length|5923763|它使用tesseract-ocr或ocrad作为OCR引擎。|5923764|可以用软件中心或|5923765|sudo+apt-get+install+ocrfeeder|code-block|syntax|javascript|entityMap|0|IMAGE|mutability|IMMUTABLE|imageUrl|https://i.stack.imgur.com/lF8vQ.png|imageAlt|1|LINK|MUTABLE|url|https://apps.ubuntu.com/cat/applications/ocrfeeder/^0|0|0|0|1|0|0|0|3|4|1|0^^$0|@$1|2|3|4|5|6|7|15|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|16|8|@]|9|@]|A|$]]|$1|E|3|F|5|G|7|17|8|@]|9|@$H|18|I|19|1|1A]]|A|$]]|$1|J|3|K|5|D|7|1B|8|@]|9|@]|A|$]]|$1|L|3|M|5|D|7|1C|8|@]|9|@$H|1D|I|1E|1|1F]]|A|$]]|$1|N|3|O|5|P|7|1G|8|@]|9|@]|A|$Q|R]]]|S|$T|$5|U|V|W|A|$X|Y|Z|-4]]|10|$5|11|V|12|A|$13|14]]]]

OCRFeeder

It is a GUI application.

<img src="https://i.stack.imgur.com/lF8vQ.png" alt="enter image description here">

It uses tesseract-ocr or ocrad as OCR engine.

Can install with <a href="https://apps.ubuntu.com/cat/applications/ocrfeeder/" rel="nofollow noreferrer">Software Center</a> or with,

<pre><code>sudo apt-get install ocrfeeder
</code></pre>

blocks|key|4734718|text|FineReader也有一个在线版本。它声称能够将PDF作为输入格式来处理--+http://finereader.abbyyonline.com/en/Help/Faq/。|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|entityMap|0|LINK|mutability|MUTABLE|url|http://finereader.abbyyonline.com/en/Help/Faq/^0|14|1A|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@$A|L|B|M|1|N]]|C|$]]]|D|$E|$5|F|G|H|C|$I|J]]]]

FineReader also has an online version. It claims to be able to process PDFs as input format --- <a href="http://finereader.abbyyonline.com/en/Help/Faq/" rel="nofollow">http://finereader.abbyyonline.com/en/Help/Faq/</a>

I have seen some ebooks/papers that were apparently scanned from their paper versions but the text in the ebooks/papers can amazingly be copied out. I suppose the directly-scanned versions must have been processed by some Optical Character Recognition software. 

So I would like to know what are the recommended Optical Character Recognition softwares? Especially those that are either for Ubuntu or free? If those for Windows are far more superior, please let me know as well.

I am particularly interested in those OCRs that can accept a scanned pdf file as input and still produce as output another pdf file that looks the same as the input one but with its text copyable.

Thanks and regards!

Please limit one software per answer

Optical Character Recognition software recommendations?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我看到一些电子书/论文显然是从他们的纸质版本中扫描出来的，但电子图书/论文中的文本却能惊人地被复制出来。我想直接扫描的版本一定是由一些光学字符识别软件处理的。所以我想知道什么是推荐的光学字符识别软件？尤其是那些要么是Ubuntu的，要么是免费的？如果那些在Windows上要好得多，请告诉我。我特别感兴趣的那些OCR，可以接受一个扫描的pdf文件作为输入，但仍然产生作为输出的另一个pdf文件看起来与

问光学字符识别软件推荐？
EN

回答 10

Ask Ubuntu用户

Tesseract OCR

Ask Ubuntu用户

Ask Ubuntu用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问光学字符识别软件推荐？EN

回答 10

Ask Ubuntu用户

Tesseract OCR

Ask Ubuntu用户

Ask Ubuntu用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问光学字符识别软件推荐？
EN