文章/答案/技术大牛

发布

问用Perl获取块数据的元素
EN

Stack Overflow用户

提问于 2013-12-18 03:15:27

回答 2查看 96关注 0票数 2

我有这样的数据：

some info
some info

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

请注意，该文件的末尾包含空白。

我想要做的是解析以[Term]开头的每个块，并获得id、name和namespace。在一天结束时，会有如下数组的散列：

$VAR = ['GO:0000001' => ["mitochondrion inheritance","biological_process"],
        'GO:0000002' => ["mitochondrial genome maintenance","biological_process"];

我该怎么做呢，Perl？

我被这个密码困住了：

#!/usr/bin/perl
use Data::Dumper;
my %bighash;
while(<DATA>) {
  chomp;
  my $line = $_;

  my $term = "";
  my $id = "";
  my $name ="";
  my $namespace ="";
  if ($line =~ /^\[Term/) { 
   $term = $line;
  }
  elsif ($line =~ /^id: (.*)/) {
   $id = $1;
  }
  elsif ($line =~ /^name: (.*)/) {
   $name = $1;
  }
  elsif ($line =~ /^namespace: (.*)/) {
   $namespace = $1;
  }
  elsif ($line =~ /$/) {
     $bighash{$id}{$name} = $namespace;
  }

}

print Dumper \%bighash;



__DATA__
some info
some info

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

这里的测试：https://eval.in/80497

linux

perl

unix

回答 2

Stack Overflow用户

回答已采纳

发布于 2013-12-18 04:09:39

如果您将Perl的输入记录分隔符设置为'' (local $/ = '';)，您将以段落模式读取数据，即用空行分隔的块。接下来，您可以使用regexes从该块中捕获所需的部分。例如：

use strict;
use warnings;
use Data::Dumper;

local $/ = '';
my %hash;

while (<DATA>) {
    next unless /^\[Term\]/;

    my ($id)        = /id:\s+(.+)/;
    my ($name)      = /name:\s+(.+)/;
    my ($namespace) = /namespace:\s+(.+)/;

    push @{ $hash{$id} }, ( $name, $namespace );
}

print Dumper \%hash;

__DATA__
[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cy
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Typedef]
id: regulates
name: regulates
xref: RO:0002211
transitive_over: part_of ! part_of

输出：

$VAR1 = {
          'GO:0000001' => [
                            'mitochondrion inheritance',
                            'biological_process'
                          ],
          'GO:0000002' => [
                            'mitochondrial genome maintenance',
                            'biological_process'
                          ]
        };

希望这能有所帮助！

票数 5

Stack Overflow用户

发布于 2013-12-18 04:02:34

这是一个很好的技巧，也许会有帮助。Perl有一个$/变量，它定义了“输入记录分隔符”--当您使用<DATA>读取输入记录时，它将一直读取，直到遇到设置为$/的任何数据，然后返回所有这些数据。

通常，$/设置为换行符，因此<DATA>每次从文件中返回一行。但是，如果将其设置为空字符串""，则每次读取都将返回所有数据，直到下一行空行或空行序列。

$/ = "";
while (<DATA>) {
    chomp;        # remove the trailing newlines
    # $_ now contains a whole blank-line-separated chunk
    if (/^\[Term\]/) {
        ...
        # parse the [Term] chunk here
        ...
    }
}

在循环内部，您可以通过将块拆分成行来解析它，然后将:字符串上的每一行拆分以获得一个键和值。此时，您可以将该块的键和值放入任何您喜欢的结构中。

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20649054

复制

相似问题

问用Perl获取块数据的元素
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Perl获取块数据的元素EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Perl获取块数据的元素
EN