我正在解析一个大的EMBL文件(>1G),并将其转换为gff文件。它有一些条目与传统的embl结构不匹配,从而导致bioperl模块抛出异常。我的问题是,因为有错误的条目只占总序列的一小部分,所以我想继续执行脚本,暂时忽略异常。但是perl脚本总是被异常停止。
我使用的是linux操作系统和perl版本5.8.8
我的perl脚本
use strict;
use Bio::SeqIO;
use Bio::Tools::GFF;
use warnings;
use Try::Tiny;
open (E ,">","emblError.txt");
if (@ARGV != 1) { die "USAGE: embl2gff.pl > outputfile.\n"; }
my $in = Bio::SeqIO->new(-file=>$ARGV[0],-format=>'EMBL');
eval {
while (my $seq = $in->next_seq) {
for my $feat ($seq->top_SeqFeatures) {
my $gffio = Bio::Tools::GFF->new(-gff_version => 3);
print $feat->gff_string($gffio)."\n";
}
}
};
if ($@) {
warn "Oh no! [$@]\n";
}我得到的错误
Name "main::E" used only once: possible typo at embl2GFF3.pl line 7.
--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(9174..9343,14214..14303)complement(9268..9363),complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(4690..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature mRNA (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------
---------------------------------------------------
--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(14219..14303,14368..14513)complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(6461..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature CDS (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------
---------------------------------------------------
Oh no! [Can't call method "isa" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, <GEN0> line 538764.
]注意:我没有发布两次异常,它只是以这种方式发生,似乎只捕获了一个异常。
这是导致问题的embl文件块。mRNA条目引起第一个异常,CDS引起第二个异常。
FT mRNA join(9174..9343,14214..14303)
FT complement(9268..9363),complement(9140..9198),
FT complement(8965..9034),complement(8751..8884),
FT complement(8419..8535),complement(8232..8337),
FT complement(7952..8149),complement(7256..7332),
FT complement(7051..7175),complement(6769..6877),
FT complement(6601..6659),complement(4690..6530))
FT /gene="ENSXMAG00000014948"
FT /note="transcript_id=ENSXMAT00000015030"
FT CDS join(14219..14303,14368..14513)
FT complement(9140..9198),complement(8965..9034),
FT complement(8751..8884),complement(8419..8535),
FT complement(8232..8337),complement(7952..8149),
FT complement(7256..7332),complement(7051..7175),
FT complement(6769..6877),complement(6601..6659),
FT complement(6461..6530))
FT /gene="ENSXMAG00000014948"
FT /protein_id="ENSXMAP00000015010"
FT /note="transcript_id=ENSXMAT00000015030"
FT /db_xref="HGNC_transcript_name:ENO3-201"发布于 2013-04-04 03:42:35
eval不能捕获低级的Perl错误。还要检查$SIG{__DIE__}处理程序。如果一个模具处理程序写得不熟练,它可能就会死掉。例如,如果处理程序不检查$EXCEPTIONS_BEING_CAUGHT,它可能会从模具处理程序执行exit。
但只需查看您的输出,如果它打印出以下内容:
Oh no! [Can't call method "isa" on an undefined value at
/usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, line 538764. ]那么,它并没有像你所说的那样工作。您的eval正在捕获错误,否则您将无法在前面使用"Oh no!"打印它。看起来它自己也在做一些堆栈跟踪转储。
最后,您的程序状态看起来是依赖于数据的,文件中的一些错误值可能会使程序处于错误的状态。无论出于什么原因,它都无法创建BIO::Seq对象,并将其传递给某个函数,该函数会检查参数是否isa了某个东西。看起来输入文件中有问题的行是#538,764。但我可能错了。
备注:在评论中解决您的问题。如果Bioperl正在处理它发现的错误,而您只想遍历一系列记录,那么我的建议是将eval放在循环中-- while或for循环。对于某些多线程应用程序来说,这是一个非常标准的表单。
while ( 1 ) {
eval { $me->spin(); 1; } or say "WARNING: $@";
# unless we are officially done, just get ready to
# handle somebody causing an exception in our thread.
last if $me->done;
}如果可能,请记住将eval放在要恢复处理的位置。
https://stackoverflow.com/questions/15796492
复制相似问题