我有一个不同行的文件,每一行都有一个重复的模式,我需要一个propper数据结构来解析我的文件,例如:
cluster1:gene1(genome1) gene2(genome2) gene3(genome3)
cluster2:gene4(genome4) gene5(genome5)名字是任意的可以是任何东西。
我想过哈希数据结构的散列
%hoh=("cluster1" => {
"gene1"=>"genome1"
"gene2"=>"genome2"
"gene2"=>"genome2"
}, "cluster2" => {
"gene4"=>"genome4"
"gene5"=>"genome5"
}
)我有两个问题:第一:如何在每一行中寻找重复模式?
第二,我如何做哈希的哈希?
编辑:应Zaid的要求张贴
#!/usr/bin/perl -w
use strict; use warnings;
my %HoH;
while(<DATA>){
my $line=$_;
chomp($line);
my ( $cluster, $genes ) = split (/:/,$line);
$HoH{ $cluster } = { split/[( )]+/ , $genes };
}
foreach $cluster (keys %HoH){
print "$cluster: ";
foreach $genes (keys %{$HoH{$cluster}}){
print "$genes = $HoH{$cluster}{$genes} ";
}
print "\n";
}
__DATA__
cluster1:gene1(genome1) gene2(genome2) gene3(genome3)
cluster2:gene4(genome4) gene5(genome5)发布于 2012-08-29 17:23:26
在OP发布其尝试时应遵循的解释:
my %HoH;
while (<>) {
chomp;
my ( $cluster, $genes ) = split /:/;
$HoH{ $cluster } = { split /[( )]+/, $genes };
}发布于 2012-08-29 17:18:41
假设模式始终遵循AAA:BBB(CCC) DDD(EEE) FFF(GGG)...,则可以使用以下算法:
:上拆分,将第一部分作为您的密钥([^(]+)\(([^)])\)$hoh{key from step 2} =步骤4中的散列未经测试,但如下所示(散列引用的内容有点不确定,但您可以理解):
while(<>) {
($key, $rest) = split ':';
@genes = split ' ', $rest;
my %h;
foreach $gene (@genes) {
($k, $v) = split /[\(\)]/, $gene;
$h{$k} = $v;
}
$hoh{$key}=\%h;
}不过,可能有一种更优雅的PERL-y方法可以做到这一点:)
发布于 2012-08-30 13:57:08
#!/usr/bin/perl -w
use strict; use warnings;
my %HoH;
while(<DATA>){
my $line=$_;
chomp($line);
my ( $cluster, $genes ) = split (/:/,$line);
$HoH{ $cluster } = { split/[( )]+/ , $genes };
}
foreach my $cluster (keys %HoH){
print "$cluster: ";
foreach my $genes (keys %{$HoH{$cluster}}){
print "$genes = $HoH{$cluster}{$genes} ";
}
print "\n";
} __DATA__
簇1:Gene1(Genome1) gene2(genome2) gene3(genome3)
簇2:Gene4(Genome4) gene5(genome5)
https://stackoverflow.com/questions/12183124
复制相似问题