从给定的两个序列中,我需要每三个密码子检查一次,如果更改与以下列表中的更改相同,那么我必须检查更改的位置和更改的密码子,并计算它们出现的次数。
例如:
sequence 1 - TTCAUUUCCCAU
sequence 2 - TTTAUAUCGCAC我需要得到的输出是
TTC->TTT considered/location-1/count-1
AUU->AUA considered/location-2/count-1
UCC->UCG considered/location-3/count-1注意:不考虑CAU->CAC,因为它不在下面的列表中。LIST:->也应该考虑更改的方向。
first sequence->second sequence
TTC->TTT
CTG->UUA
AUU->AUA
GUG->GUA
UCC->UCG
CCC->CCG
ACC->ACG
GCC->GCG
UAC->UAU
UGA->UAG
CAC->CAU
CAG->CAA
AAC->AAU
AAG->AAA
GAC->GAU
GAG->GAA
UGC->UGU
CGG->CGU
AGC->AGU
AGG->CGU
AGA->CGU
UAA->UAG
GGC->GGU到目前为止,我写的代码是:
print "Enter the sequence:";
$a = <>;
print "Enter the mutated sequence:";
$b = <>;
chomp($a);
chomp($b);
my @codon = split(/(\w{3})/, $a);
my @codon1 = split(/(\w{3})/, $b);
open(OUT, ">output.txt") or die;
$count = 0;
@new = ();
@new1 = ();
for ($i = 0; $i <= $#codon; $i++) {
for ($j = 0; $j <= $#codon1; $j++) {
if ($codon[$i] = {TTC}) || ($codon1[$j] = {TTT}) {
$count++;
}
}
}
print OUT " @new";
close OUT;发布于 2011-02-03 15:38:42
#!/usr/bin/env perl
use strict;
my %seq_map = (
"TTC"=>"TTT",
"CTG"=>"UUA",
"AUU"=>"AUA",
"GUG"=>"GUA",
"UCC"=>"UCG",
"CCC"=>"CCG",
"ACC"=>"ACG",
"GCC"=>"GCG",
"UAC"=>"UAU",
"UGA"=>"UAG",
"CAC"=>"CAU",
"CAG"=>"CAA",
"AAC"=>"AAU",
"AAG"=>"AAA",
"GAC"=>"GAU",
"GAG"=>"GAA",
"UGC"=>"UGU",
"CGG"=>"CGU",
"AGC"=>"AGU",
"AGG"=>"CGU",
"AGA"=>"CGU",
"UAA"=>"UAG",
"GGC"=>"GGU"
);
my %seq_count = ();
my $seq1 = "TTCAUUUCCCAU";
my $seq2 = "TTTAUAUCGCAC";
my $max = int(length($seq1) / 3);
for(my $i=0;$i<$max;$i++) {
my $c1 = substr($seq1, $i*3, 3);
my $c2 = substr($seq2, $i*3, 3);
my $found = $seq_map{$c1};
if ($found && ($found eq $c2)) {
$seq_count{$c1} ||= 0;
my $count = ++$seq_count{$c1};
my $loc = $i+1;
print "${c1}->${c2} considered / location ${loc} / count ${count}\n";
}
}发布于 2011-02-03 14:58:20
有许多方法可以实现这一点,就像Perl中的典型情况一样。
如果文件不大,您可以将文件逐行读入一个数组(或者,如果已经是每行一个条目,则只需将整个文件读入一个数组)。然后使用while循环(和第二个文件的文件句柄)比较二核苷酸的位置。
因为这是一个生物信息学问题,而且文件通常很大,所以我会聪明地从每个文件句柄逐行读取,并进行比较。
对于您尝试进行的3个字符的拆分,我将使用for循环,直到您正在检查的字符串的长度除以3 -1。然后创建正则表达式,抓取前三个字母,然后是下一个字母,依此类推…
像/\d{$count}(\w{3})/这样的东西
while循环可能如下所示:
#!/usr/bin/perl -w
use strict;
open FILE1, "file1.txt" or die "Cannot open file1.txt: $!\n";
open FILE2, "file2.txt" or die "Cannot open file2.txt: $!\n";
my $count = 0;
while (<FILE1>) {
chomp(my $lineF1 = $_);
chomp(my $lineF2 = <FILE2>);
# some changes may need to be made to this if statement
if ($lineF1 eq $lineF2) {
# do something important here
print "$lineF1\n";
} else {
print "Line $count mismatch\n";
}
$count++;
}
close(FILE1);
close(FILE2);发布于 2011-02-03 21:54:41
你能认为这两个文件中的密码子是“对齐的”吗?如果是这样的话,问题就很简单了:加载2级散列中的有效转换列表:
# of course, you load this from a file...
$transitions{TTC}{TTT} = 1;
$transitions{CTG}{UUA} = 1;
...然后,逐行读取这两个文件(或者它们只是一个字符串?):
# of course, I'm leaving out all the file manipulation...
my $line1 = <FILE1>;
my $line2 = <FILE2>;
my $maxlen1 = length($line1);
my $maxlen2 = length($line2);
my $i = 0;
while($i < $maxlen1 && $i < $maxlen2){
my $codon1 = substr($line1, $i, $i+3);
if(exists($transitions{$codon1}){
my $codon2 = substr($line2, $i, $i+3);
if(exists($transitions{$codon1}{$codon2}){
print "we have a match $codon1 -> $codon2 at index $i\n";
}
}
$i += 3;
}如果你不想使用下一个if(),你可以计算$codon1和$codon2,然后检查if(exists($transitions{$codon1}{$codon2})) {},使用'exists‘可以避免自动生成问题...({$codon2}{$transitions}{$codon2}){}如果你不想使用nexted If (),你可以计算$codon1和$codon2,然后检查if(exists($transitions{$codon1}{$codon2})){}可以避免自动生成问题...
https://stackoverflow.com/questions/4883062
复制相似问题