文章/答案/技术大牛

发布

社区首页 >问答首页 >Perl分组:寻找更快的递归解决方案

问Perl分组:寻找更快的递归解决方案
EN

Stack Overflow用户

提问于 2015-10-07 01:16:52

回答 1查看 93关注 0票数 0

下面的Perl代码可以工作，但是即使有相当多的计算机资源，它也不能很好地伸缩。我希望有人能帮助我找到更有效的代码，例如用迭代代替递归，如果这是问题所在。

我的数据结构是这样的:我的%REV_ALIGN；

$REV_ALIGN{$dna}{$rna} = ()；

任何dna密钥都可以有多个RNA子密钥。相同的RNA子密钥可以与多个不同的dna密钥一起出现。其目的是基于共享的dna序列元素对rna (转录本)进行分组。例如，如果dnaA有RNA1、RNA8、RNA9和RNA4，而dnaB有RNA11、RNA4和RNA99，那么我们将所有这些转录本( RNA1、RNA9、RNA4、RNA11、RNA99 )分组在一起，并继续尝试通过选择其他dna来添加到组中。我对这个问题的递归解决方案有效，但当使用来自整个基因组的数据进行转录组比对时，规模不是很好。

所以我的问题是:对于这个问题，什么是更有效的解决方案？非常感谢

my @groups;

while ( my $x =()= keys %REV_ALIGN )
{
    my @DNA = keys %REV_ALIGN;
    my $dna = shift @DNA;
    # the corresponding list of rna
    my @RNA = keys %{$REV_ALIGN{$dna}};
    delete $REV_ALIGN{$dna};

    if ( $x == 1 )
    {
        push @groups, \@RNA;
        last;
    }

    my $ref = group_transcripts ( \@RNA, \%REV_ALIGN );
    push @groups, $ref;
}


sub group_transcripts
{
    my $tran_ref = shift;
    my $align_ref = shift;
    my @RNA_A = @$tran_ref;
    my %RNA;
    # create a null hash with seed list of transcripts
    @RNA{@RNA_A} = ();
    # get a list of all remaining dna sequences in the alignment
    my @DNA = keys %{$align_ref};
    my %count;
    # select a different list of transcripts
    for my $dna ( @DNA )
    {
        next unless exists $align_ref->{$dna};
        my @RNA_B = keys %{$align_ref->{$dna}};
        # check to see two list share and transcripts
        for my $element ( @RNA_A, @RNA_B )
        {
            $count{$element}++;
        }
        for my $rna_a ( keys %count )
        {
            # if they do, add any new transcripts to the current group
            if ( $count{$rna_a} == 2 )
            {
                for my $rna_b ( @RNA_B )
                {
                    push @RNA_A, $rna_b if $count{$rna_b} == 1;
                }
                delete $align_ref->{$dna};
                delete $count{$_} foreach keys %count;
                # recurse to try and continue adding to list
                @_ = ( \@RNA_A, $align_ref );
                goto &group_transcripts;
            }
        }
        delete $count{$_} foreach keys %count;
    }
   # if no more transcripts can be added, return a reference to the group
    return \@RNA_A;
}

grouping

perl

recursion

回答 1

Stack Overflow用户

发布于 2015-10-07 03:33:29

您有一个嵌套了四个深度的循环。我敢打赌，这就是你的代码伸缩性差的原因。

如果我正确地理解了您想要完成的任务，那么输入

my %REV_ALIGN = (
   "DNA1" => { map { $_ => undef } "RNA1", "RNA2" }, # \ Linked by RNA1     \
   "DNA2" => { map { $_ => undef } "RNA1", "RNA3" }, # /  \ Linked by RNA3   > Group
   "DNA3" => { map { $_ => undef } "RNA3", "RNA4" }, #    /                 /

   "DNA4" => { map { $_ => undef } "RNA5", "RNA6" }, # \ Linked by RNA5     \  Group
   "DNA5" => { map { $_ => undef } "RNA5", "RNA7" }, # /                    /

   "DNA6" => { map { $_ => undef } "RNA8" },         #                      >  Group
);

应该会导致

my @groups = (
   [
      dna => [ "DNA1", "DNA2", "DNA3" ],
      rna => [ "RNA1", "RNA2", "RNA3", "RNA4" ],
   ],
   [
      dna => [ "DNA4", "DNA5" ],
      rna => [ "RNA5", "RNA6", "RNA7" ],
   ],
   [
      dna => [ "DNA6" ],
      rna => [ "RNA8" ],
   ],
);

如果是这样，您可以使用以下命令：

use strict;
use warnings;

use Graph::Undirected qw( );

my %REV_ALIGN = (
   "DNA1" => { map { $_ => undef } "RNA1", "RNA2" },
   "DNA2" => { map { $_ => undef } "RNA1", "RNA3" },
   "DNA3" => { map { $_ => undef } "RNA3", "RNA4" },
   "DNA4" => { map { $_ => undef } "RNA5", "RNA6" },
   "DNA5" => { map { $_ => undef } "RNA5", "RNA7" },
   "DNA6" => { map { $_ => undef } "RNA8" },
);

my $g = Graph::Undirected->new();
for my $dna (keys(%REV_ALIGN)) {
   for my $rna (keys(%{ $REV_ALIGN{$dna} })) {
      $g->add_edge("dna:$dna", "rna:$rna");
   }
}

my @groups;
for my $raw_group ($g->connected_components()) {
   my %group = ( dna => [], rna => [] );
   for (@$raw_group) {
      my ($type, $val) = split(/:/, $_, 2);
      push @{ $group{$type} }, $val;
   }

   push @groups, \%group;
}

use Data::Dumper qw( Dumper );
print(Dumper(\@groups));

如果您只需要RNA，最后一节将简化为以下内容：

my @groups;
for my $raw_group ($g->connected_components()) {
   my @group;
   for (@$raw_group) {
      my ($type, $val) = split(/:/, $_, 2);
      push @group, $val if $type eq 'rna';
   }

   push @groups, \@group;
}

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32975805

复制

相似问题

问Perl分组:寻找更快的递归解决方案
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Perl分组:寻找更快的递归解决方案EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Perl分组:寻找更快的递归解决方案
EN