输入文件包含超过1个换行符,空标记如下:
<html>
<body>
<title>XXX</title>
<p>text...</p>
<collaboration seq="">
<ce:text></ce:text>
</collaboration>
...
<p>text</p>
<collaboration seq="">
<ce:text>AAA</ce:text>
</collaboration>
<p>text</p>
</body>
</html>输出文件只需要一个换行符,必须删除空标签
<html>
<body>
<title>XXX</title>
<p>text...</p>
...
<p>text</p>
<p>text</p>
<collaboration seq="">
<ce:text>AAA</ce:text>
</collaboration>
</body>
</html>在尝试过的代码中:
print "Enter the file name without extension: ";
chomp($filename=<STDIN>);
open(RED,"$filename.txt") || die "Could not open TXT file";
open(WRIT,">$filename.html");
while(<RED>)
{
#process in file
s/<collaboration seq="">\n<ce:text><\/ce:text>\n<\/collaboration>//g;
s/\n\n//g;
print WRIT $_;
}
close(RED);
close(WRIT);上面的代码不会清除任何需要的东西。如何解决这个问题?
发布于 2014-12-17 21:36:19
首先,您应该实际使用该文件。因此,假设您使用zigdon's method
my $file;
{
print "Enter the file name without extension: ";
my $filename = <STDIN>
chomp($filename);
open F, $filename or die "Can't read $filename: $!";
local $/; # enable slurp mode, locally.
$file = <F>;
close F;
}现在,$file包含了文件的内容,因此您可以使用它。
#process in file
$file ~= s/<collaboration seq="">\R<ce:text><\/ce:text>\R<\/collaboration>//g;
$file ~= s/\R{2,}/\n/g; #I'm guessing this is probably what you intended
print WRIT $file; 发布于 2014-12-20 12:19:16
您可以使用XML::Simple实现这一点:
# use XML simple to process the XML
my $xs = XML::Simple->new(
# remove extra whitespace
NormaliseSpace => 2,
# keep root element
KeepRoot => 1,
# force elements to arrays
ForceArray => 1,
# ignore empty elements
SuppressEmpty => 1
);
# read in the XML
my $ref = $xs->XMLin($xml);
# print out the XML minus the empty tags
print $xs->XMLout($ref);https://stackoverflow.com/questions/27521313
复制相似问题