我有一个数据库,其中有许多包含逗号分隔值的字段。我需要在Perl中拆分这些字段,这非常简单,除了一些值后面是包含在括号中的嵌套CSV,我不想拆分。
示例:
recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education拆分",“给了我:
recycling
environmental science
interdisciplinary (e.g.
consumerism
waste management
chemistry
toxicology
government policy
and ethics)
consumer education我想要的是:
recycling
environmental science
interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics)
consumer educationPerl正则表达式(Pert)能帮上忙吗?
我尝试修改在类似的SO post中找到的正则表达式字符串,但没有返回任何结果:
#!/usr/bin/perl
use strict;
use warnings;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts = $s =~ m{\A(\w+) ([0-9]) (\([^\(]+\)) (\w+) ([0-9]) ([0-9]{2})};
use Data::Dumper;
print Dumper \@parts;发布于 2012-02-25 02:12:24
试试这个:
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts = split /(?![^(]+\)), /, $s;发布于 2012-02-25 02:57:23
您选择的解决方案更好,但与那些认为不是这样的人相比,正则表达式有一个递归元素,它将匹配嵌套的圆括号。下面的代码运行良好
use strict;
use warnings;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
push @parts, $1 while $s =~ /
((?:
[^(),]+ |
( \(
(?: [^()]+ | (?2) )*
\) )
)*)
(?: ,\s* | $)
/xg;
print "$_\n" for @parts;即使括号嵌套得更深。不,它不是很漂亮,但它确实可以工作!
发布于 2012-02-25 02:47:55
有没有人说你必须一步到位?您可以在循环中对值进行切片。根据您的示例,您可以使用下面这样的内容。
use strict;
use warnings;
use 5.010;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
while(1){
my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/;
if (not $elem) {
say "second approach";
($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/;
}
$s = $rest;
push @parts, $elem;
last if not $s;
}
use Data::Dumper;
print Dumper \@parts;https://stackoverflow.com/questions/9435564
复制相似问题