发布于 2016-05-25 02:22:42
@priya..我尝试了这个模块,它可以很好地提取PDF文本。
use strict;
use warnings;
use PDF::OCR::Thorough;
my $filename = "pdf.pdf";
my $pdf = PDF::OCR::Thorough->new($filename);
my $text = $pdf->get_text();
print "$text";发布于 2016-04-29 22:16:35
使用CAM::PDF。它有一些方法可以帮助你提取图像或其他元素:
$doc->getProperty($pagenum, $propertyname)
Each PDF page contains a list of resources that it uses (images, fonts, etc). getPropertyNames() returns an array of the names of those resources. getProperty() returns a node representing a named property (most likely a reference node).https://stackoverflow.com/questions/36891223
复制相似问题