鉴于这些投入:
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );我想要产生:
产出产出,如:
AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags在Perl中是否有一种简洁的方法来实现它呢?
我坚持这个脚本的逻辑作为核心:
#!/usr/bin/perl
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
$i = 0;
while ($i < length($init_seq)) {
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};
print $base;
}
continue {
$i++;
}发布于 2009-03-02 10:21:58
作为一种小的优化,替换:
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};使用
$base = $dna[int(rand 4)];发布于 2009-03-02 09:32:01
编辑:假设替换率在0.001到1.000之间:
与$roll一样,在1.1000范围内生成另一个(伪随机数),如果它小于或等于(1000 * $sub_rate),则执行替换,否则什么也不做(即输出'A')。
请注意,除非您的随机数生成器的特性是已知的,否则您可能会引入微妙的偏差。
发布于 2009-03-02 20:14:57
不完全是您想要的,但我建议您看看BioPerl的生物::赛克进化::DNAPoint模块。然而,它并不以突变率为参数。相反,它会询问序列标识与您想要的原始序列的下限。
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqEvolution::Factory;
my $seq = Bio::Seq->new(-seq => 'AAAAAAAAAA', -alphabet => 'dna');
my $evolve = Bio::SeqEvolution::Factory->new (
-rate => 2, # transition/transversion rate
-seq => $seq
-identity => 50 # At least 50% identity with the original
);
my @mutated;
for (1..1000) { push @mutated, $evolve->next_seq }所有1000个变异序列将存储在@mutated数组中,它们的序列可以通过seq方法访问。
https://stackoverflow.com/questions/601727
复制相似问题