我有这样的数据:
some info
some info
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization
[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of请注意,该文件的末尾包含空白。
我想要做的是解析以[Term]开头的每个块,并获得id、name和namespace。在一天结束时,会有如下数组的散列:
$VAR = ['GO:0000001' => ["mitochondrion inheritance","biological_process"],
'GO:0000002' => ["mitochondrial genome maintenance","biological_process"];我该怎么做呢,Perl?
我被这个密码困住了:
#!/usr/bin/perl
use Data::Dumper;
my %bighash;
while(<DATA>) {
chomp;
my $line = $_;
my $term = "";
my $id = "";
my $name ="";
my $namespace ="";
if ($line =~ /^\[Term/) {
$term = $line;
}
elsif ($line =~ /^id: (.*)/) {
$id = $1;
}
elsif ($line =~ /^name: (.*)/) {
$name = $1;
}
elsif ($line =~ /^namespace: (.*)/) {
$namespace = $1;
}
elsif ($line =~ /$/) {
$bighash{$id}{$name} = $namespace;
}
}
print Dumper \%bighash;
__DATA__
some info
some info
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization
[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of这里的测试:https://eval.in/80497
发布于 2013-12-18 04:09:39
如果您将Perl的输入记录分隔符设置为'' (local $/ = '';),您将以段落模式读取数据,即用空行分隔的块。接下来,您可以使用regexes从该块中捕获所需的部分。例如:
use strict;
use warnings;
use Data::Dumper;
local $/ = '';
my %hash;
while (<DATA>) {
next unless /^\[Term\]/;
my ($id) = /id:\s+(.+)/;
my ($name) = /name:\s+(.+)/;
my ($namespace) = /namespace:\s+(.+)/;
push @{ $hash{$id} }, ( $name, $namespace );
}
print Dumper \%hash;
__DATA__
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization
[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of输出:
$VAR1 = {
'GO:0000001' => [
'mitochondrion inheritance',
'biological_process'
],
'GO:0000002' => [
'mitochondrial genome maintenance',
'biological_process'
]
};希望这能有所帮助!
发布于 2013-12-18 04:02:34
这是一个很好的技巧,也许会有帮助。Perl有一个$/变量,它定义了“输入记录分隔符”--当您使用<DATA>读取输入记录时,它将一直读取,直到遇到设置为$/的任何数据,然后返回所有这些数据。
通常,$/设置为换行符,因此<DATA>每次从文件中返回一行。但是,如果将其设置为空字符串"",则每次读取都将返回所有数据,直到下一行空行或空行序列。
$/ = "";
while (<DATA>) {
chomp; # remove the trailing newlines
# $_ now contains a whole blank-line-separated chunk
if (/^\[Term\]/) {
...
# parse the [Term] chunk here
...
}
}在循环内部,您可以通过将块拆分成行来解析它,然后将:字符串上的每一行拆分以获得一个键和值。此时,您可以将该块的键和值放入任何您喜欢的结构中。
https://stackoverflow.com/questions/20649054
复制相似问题