首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >fasta文件的反向补码

fasta文件的反向补码
EN

Stack Overflow用户
提问于 2016-05-12 21:48:16
回答 3查看 1K关注 0票数 2

我试图在一个多fasta文件中得到RNA的反向补码

投入:

代码语言:javascript
复制
>cel-mir-39 MI0010 C elegans miR-39
UAUACCGAGAGCCCAGCUGAUUUCGUCUUGGUAAUAAGCUCGUCAUUGAGAUUAUCACCGGGUGUAAAUCAGCUUGGCUCAAAAAAAA

>cel-let-7 MI0001 C elegans let-7
UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAACUAUGCAAUUUUCUACCUUACCGGAGGGGGGG

产出:

代码语言:javascript
复制
>cel-mir-39 MI0010 C elegans miR-39
UUUUUUUUGAGCCAAGCUGAUUUACACCCGGUGAUAAUCUCAAUGACGAGCUUAUUACCAAGACGAAAUCAGCUGGGCUCUCGGUAUA

>cel-let-7 MI0001 C elegans let-7
CCCCCCCUCCGGUAAGGUAGAAAAUUGCAUAGUUCACCGGUGGUAAUAUUCCAAACUAUACAACCUACUACCUCACCGGAUCCACAGUGUA

但我得到的却是:

代码语言:javascript
复制
UUUUUUUUGAGCCAAGCUGAUUUACACCCGGUGAUAAUCUCAAUGACGAGCUUAUUACCAAGACGAAAUCAGCUGGGCUCUCGGUAUA
93-Rim snucele G 0100IM 93-rim-leg 
CCCCCCCUCCGGUAAGGUAGAAAAUUGCAUAGUUCACCGGUGGUAAUAUUCCAAACUAUACAACCUACUACCUCACCGGAUCCACAGUGUA
7-tel snucele G 1000IM 7-tel-leg 

我的代码:

代码语言:javascript
复制
#!/usr/bin/perl
use strict;
use warnings;

print "type in the path of the file\n";
my $file_name = <>;
chomp($file_name); 

open (FASTA, $file_name) or die "error #!"; 

$/ = ">";
<FASTA>;    
while (my $entry = <FASTA>){
    $entry = reverse $entry;
    $entry =~ tr/ACGUacgu/UGCAugca/;
    print "$entry \n";
}

close(FASTA);

我怎样才能只反转序列而不是头?谢谢

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-05-12 21:54:52

读取由>分隔的记录是一个不错的主意,因为它一次给出了整个数据块。但是,这里您想要处理和合并行,而不是头,从而区分行。一行行地读比较清楚。

序列行是特定的:所有的大写,没有其他的。空行分隔要处理的记录。剩下的可能性是标题。序列是通过与其图案匹配的连接线组装的,一旦我们到达空行,它就会被处理和打印。

代码语言:javascript
复制
open (FASTA, $file_name) or die "error $!";

# sequence, built by joining lines =~ /^[A-Z]+$/
my $sequence = '';

while (my $entry = <FASTA>)
{
    if ($entry =~ m/^[A-Z]+$/) {
        # Assemble the sequence from separate lines
        chomp($entry);
        $sequence .= $entry;
    }
    elsif ($entry =~ m/^\s*$/) { 
        # process and print the sequence and blank line, reset for next
        $sequence = reverse $sequence;
        $sequence =~ tr/ACGUacgu/UGCAugca/;
        print "$sequence\n";
        print "\n";
        $sequence = '';
    }
    else { # header
        print $entry;
    }
}

# Print the last sequence if the file didn't end with blank line    
if (length $sequence) {
    $sequence = reverse $sequence;
    $sequence =~ tr/ACGUacgu/UGCAugca/;
    print "$sequence\n";
}

^$是字符串开头和结尾的锚点。因此,与序列匹配的正则表达式要求整行严格限制。另一个regex只允许可选的空格\s*,指定一个空行。

序列处理是从问题中复制的。

票数 2
EN

Stack Overflow用户

发布于 2016-05-12 22:11:17

血栓素受体解决方案:

代码语言:javascript
复制
@(bind compl @(hash-from-pairs (zip "ACGUacgu" "UGCAugca")))
@(repeat)
>@header
@  (collect)
@rna
@  (until)

@  (end)
@  (output)
>@header
@(mapcar compl (reverse (cat-str rna)))

@  (end)
@(end)

运行:

代码语言:javascript
复制
$ txr revcomp.txr data
>cel-mir-39 MI0010 C elegans miR-39
UUUUUUUUGAGCCAAGCUGAUUUACACCCGGUGAUAAUCUCAAUGACGAGCUUAUUACCAAGACGAAAUCAGCUGGGCUCUCGGUAUA

>cel-let-7 MI0001 C elegans let-7
CCCCCCCUCCGGUAAGGUAGAAAAUUGCAUAGUUCACCGGUGGUAAUAUUCCAAACUAUACAACCUACUACCUCACCGGAUCCACAGUGUA

此变体将输出格式化为46列,如下所示:

代码语言:javascript
复制
@(bind compl @(hash-from-pairs (zip "ACGUacgu" "UGCAugca")))
@(repeat)
>@header
@  (collect)
@rna
@  (until)

@  (end)
@  (output)
>@header
@  (repeat :vars ((crna (tuples 46 (mapcar compl (reverse (cat-str rna)))))))
@crna
@  (end)

@  (end)
@(end)

运行:

代码语言:javascript
复制
$ txr revcomp.txr data
>cel-mir-39 MI0010 C elegans miR-39
UUUUUUUUGAGCCAAGCUGAUUUACACCCGGUGAUAAUCUCAAUGA
CGAGCUUAUUACCAAGACGAAAUCAGCUGGGCUCUCGGUAUA

>cel-let-7 MI0001 C elegans let-7
CCCCCCCUCCGGUAAGGUAGAAAAUUGCAUAGUUCACCGGUGGUAA
UAUUCCAAACUAUACAACCUACUACCUCACCGGAUCCACAGUGUA
票数 0
EN

Stack Overflow用户

发布于 2016-05-13 03:56:59

尝试如下所示

首先,我将数据拆分为换行符。并将头存储到$header中,其余数据存储在@ar中。

然后通过换行符加入数组并存储到$entry中。然后执行替换以从RNA序列中删除\n>\r\s字符。

然后像往常一样,反转字符串并执行转换。最后,通过print语句获得输出。

代码语言:javascript
复制
open my $fh,"<","filename.text" or die"error opening $!";

$/ = ">";

<$fh>;

while (<$fh>)
{
    my ($header,@ar) = split("\n",$_);

    my $entry =join("\n",@ar);

    $entry=~s/\n|\r|>|\s//g;

    $entry = reverse $entry;

    $entry =~ tr/ACGUacgu/UGCAugca/;

    print ">$header\n$entry\n\n";
}
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/37197991

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档