我在使用Web::Scraper提取javascript时遇到了问题。下面是我的测试脚本:
#!/usr/bin/perl
use Modern::Perl;
use Web::Scraper;
use Data::Dumper;
my $contents = do { local $/; <DATA> };
my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $res = $scraper->scrape($contents);
say Dumper $res;
exit;
__DATA__
<html><head><title>hello</title></head>
<body>
<script type="text/javascript">
var dummy = {}
</script>
</body>
</html>我的输出是:
$VAR1 = {
'scripts' => [
''
]
};在我看来,我找到了脚本标记,但没有保存标记之间的内容。
发布于 2013-05-23 09:25:02
在深入研究了一下xpath之后,我找到了解决方案。
将刮板行从以下位置更改:
my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };至:
my $scraper = scraper { process "//script" => 'scripts[]' =>
scraper { process '//text()', 'script'=>'TEXT'} };输出javascript代码:
$VAR1 = {
'scripts' => [
{
'script' => '
var dummy = {}
'
}
]
};我不相信流程线是简洁的,但它是有效的。
发布于 2013-05-25 06:43:09
尝试生食,就像
#!/usr/bin/perl --
use strict;
use warnings;
use Web::Scraper;
use Data::Dump;
my $contents = q{
<html><head><title>hello</title></head>
<body>
<script type="text/javascript">
var dummy = {}
</script>
</body>
</html>};
#~ my $scraper = scraper { process "//script", "scripts[]" => 'TEXT'; };
my $scraper = scraper { process "//script", "scripts[]" => 'RAW'; };
my $res = $scraper->scrape($contents);
dd $res;
__END__
{ scripts => ["\n var dummy = {}"] }https://stackoverflow.com/questions/16683106
复制相似问题