不久前,我请求帮助生成一个Perl脚本,该脚本将文本文件中的值划分为几个部分。这个脚本告诉我,当一个正值出现在文本文件的某些行中时,然后当开始文本的另一部分时,再次告诉我正值的数目。例如,这是我的文本文件:
;YP_003858584.1_BtCoVBM48_gp2 25 NKSP 0.1462 (9/9) ---
;YP_003858584.1_BtCoVBM48_gp2 66 NLTW 0.7837 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 116 NTTQ 0.7013 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 126 NGTH 0.7112 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 163 NCTY 0.7620 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 173 NIST 0.6556 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 231 NITY 0.7442 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 273 NGTI 0.7109 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 322 NITQ 0.6116 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 334 NITS 0.7296 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 361 NSSA 0.5388 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 462 NPSG 0.4656 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 541 NSTK 0.5883 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 590 NASS 0.5643 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 603 NCTD 0.7117 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 646 NSSY 0.5467 (4/9) +
;YP_003858584.1_BtCoVBM48_gp2 665 NVSS 0.7980 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 695 NNTI 0.4537 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 703 NFSI 0.5613 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 787 NFSQ 0.6209 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 1060 NFTT 0.4540 (6/9) -
;YP_003858584.1_BtCoVBM48_gp2 1084 NGTH 0.5408 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 1120 NNTV 0.5803 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 1144 NHTS 0.3828 (8/9) -
;YP_003858584.1_BtCoVBM48_gp2 1149 NVSL 0.4879 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 1159 NASV 0.5021 (3/9) +
;YP_003858584.1_BtCoVBM48_gp2 1180 NESL 0.5770 (7/9) +
;ADK66841.1_NA 25 NKSP 0.1462 (9/9) ---
;ADK66841.1_NA 66 NLTW 0.7837 (9/9) +++
;ADK66841.1_NA 116 NTTQ 0.7013 (9/9) ++
;ADK66841.1_NA 126 NGTH 0.7112 (9/9) ++
;ADK66841.1_NA 163 NCTY 0.7620 (9/9) +++
;ADK66841.1_NA 173 NIST 0.6556 (8/9) +
;ADK66841.1_NA 231 NITY 0.7442 (9/9) ++
;ADK66841.1_NA 273 NGTI 0.7109 (9/9) ++
;ADK66841.1_NA 322 NITQ 0.6116 (8/9) +
;ADK66841.1_NA 334 NITS 0.7296 (9/9) ++
;ADK66841.1_NA 361 NSSA 0.5388 (6/9) +
;ADK66841.1_NA 462 NPSG 0.4656 (5/9) -
;ADK66841.1_NA 541 NSTK 0.5883 (8/9) +
;ADK66841.1_NA 590 NASS 0.5643 (6/9) +
;ADK66841.1_NA 603 NCTD 0.7117 (9/9) ++
;ADK66841.1_NA 646 NSSY 0.5467 (4/9) +
;ADK66841.1_NA 665 NVSS 0.7980 (9/9) +++
;ADK66841.1_NA 695 NNTI 0.4537 (5/9) -
;ADK66841.1_NA 703 NFSI 0.5613 (9/9) ++
;ADK66841.1_NA 787 NFSQ 0.6209 (9/9) ++
;ADK66841.1_NA 1060 NFTT 0.4540 (6/9) -
;ADK66841.1_NA 1084 NGTH 0.5408 (6/9) +
;ADK66841.1_NA 1120 NNTV 0.5803 (6/9) +
;ADK66841.1_NA 1144 NHTS 0.3828 (8/9) -
;ADK66841.1_NA 1149 NVSL 0.4879 (5/9) -
;ADK66841.1_NA 1159 NASV 0.5021 (3/9) +
;ADK66841.1_NA 1180 NESL 0.5770 (7/9) + 当有一个正值时,这个文件向我报告:只有0.7 >=是正值。文本文件有两个部分:一个用于YP_003858584.1_BtCoVBM48_gp2,另一个用于ADK66841.1_NA。当您计算每个部分中的正值(7>=)数时,每个部分有9个正值。我有很多这样的文件,包含数百个部分,因此,我要求了解Perl中的一个脚本来计算这些值。这是一个脚本:
use strict;
use warnings;
my $cnt = {};
while(my $line = <STDIN>) {
if($. == 1) {
next;
}else {
my @cols = split(m{\s+},$line);
if(@cols == 6) {
my $potential = $cols[3];
my $id = $cols[0];
$id =~ s{^\;}{};
if(0.7 >= $potential) {
$cnt->{$id}++;
};
};
};
};
my @ids_found = sort { $a cmp $b } (keys %$cnt);
for my $id (@ids_found) {
print "PART $id:\n";
print "$cnt->{$id} (values 0.7 >=)\n";
};但是,我注意到输出中有一个错误。产出:
$ cat Test00.txt | perl File_for_count_values.pl
PART ADK66841.1_NA:
18 (values 0.7 >=)
PART YP_003858584.1_BtCoVBM48_gp2:
18 (values 0.7 >=)输出看起来不像我想要的那样,当计算这个脚本的值时,加上每个部分(9 +9= 18)的正值。产出必须是:
$ cat Test00.txt | perl File_for_count_values.pl
PART ADK66841.1_NA:
9 (values 0.7 >=)
PART YP_003858584.1_BtCoVBM48_gp2:
9 (values 0.7 >=)对于脚本中必须改变什么才能做到这一点,有什么想法吗?
欢迎任何评论。
发布于 2021-04-26 18:26:45
您的代码计算小于或等于 0.7的值。
如果我改变了:
if(0.7 >= $potential) {至:
if(0.7 <= $potential) {然后每个部分我得到9个。输出:
PART ADK66841.1_NA:
9 (values 0.7 >=)
PART YP_003858584.1_BtCoVBM48_gp2:
9 (values 0.7 >=)发布于 2021-04-26 21:20:17
请研究以下重新编写的perl脚本以获得有用性。
注意:原始代码基于指令if($. == 1)假设头--参见$。。
实现了一些更改,以提高脚本的可读性。
$thresholdnext unless $. > 1的头/第一行(除非行计数器超过一个);上,以避免替代$id,$potential在一条指令中由@cols数组填充;之前调整为第一个字段的字段号将为空注意:参见$~,它为write输出定义当前格式,用于关闭表
此脚本使用__DATA__块,最初发布的数据用于输出演示。
用while( <> )代替while( <DATA> )修改代码,允许您接受来自STDIN的输入,或者指定文件名作为脚本的参数(以./script.pl file.dat形式运行)。
#!/usr/bin/env perl
#
# vim: ai ts=4 sw=4
use strict;
use warnings;
my($id,$counter);
my $treshold = 0.7;
while( <DATA> ) {
chomp;
next unless $. > 1;
my @cols = split("[; ]+", $_);
next unless @cols == 7;
my($id,$potential) = @cols[1,4];
$counter->{$id}++ if $potential >= $treshold;
}
my @sorted_ids = sort { $a cmp $b } keys %$counter;
for $id (@sorted_ids) {
write;
}
$~ = "STDOUT_BOTTOM";
write;
exit 0;
format STDOUT_TOP =
Criteria: potential >= @#.##
$treshold
+-----------------------------+-------+
| Part | Count |
+-----------------------------+-------+
.
format STDOUT =
| @<<<<<<<<<<<<<<<<<<<<<<<<<< | @>>>> |
$id,$counter->{$id}
.
format STDOUT_BOTTOM =
+-----------------------------+-------+
.
__DATA__
;YP_003858584.1_BtCoVBM48_gp2 25 NKSP 0.1462 (9/9) ---
;YP_003858584.1_BtCoVBM48_gp2 66 NLTW 0.7837 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 116 NTTQ 0.7013 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 126 NGTH 0.7112 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 163 NCTY 0.7620 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 173 NIST 0.6556 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 231 NITY 0.7442 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 273 NGTI 0.7109 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 322 NITQ 0.6116 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 334 NITS 0.7296 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 361 NSSA 0.5388 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 462 NPSG 0.4656 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 541 NSTK 0.5883 (8/9) +
;YP_003858584.1_BtCoVBM48_gp2 590 NASS 0.5643 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 603 NCTD 0.7117 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 646 NSSY 0.5467 (4/9) +
;YP_003858584.1_BtCoVBM48_gp2 665 NVSS 0.7980 (9/9) +++
;YP_003858584.1_BtCoVBM48_gp2 695 NNTI 0.4537 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 703 NFSI 0.5613 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 787 NFSQ 0.6209 (9/9) ++
;YP_003858584.1_BtCoVBM48_gp2 1060 NFTT 0.4540 (6/9) -
;YP_003858584.1_BtCoVBM48_gp2 1084 NGTH 0.5408 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 1120 NNTV 0.5803 (6/9) +
;YP_003858584.1_BtCoVBM48_gp2 1144 NHTS 0.3828 (8/9) -
;YP_003858584.1_BtCoVBM48_gp2 1149 NVSL 0.4879 (5/9) -
;YP_003858584.1_BtCoVBM48_gp2 1159 NASV 0.5021 (3/9) +
;YP_003858584.1_BtCoVBM48_gp2 1180 NESL 0.5770 (7/9) +
;ADK66841.1_NA 25 NKSP 0.1462 (9/9) ---
;ADK66841.1_NA 66 NLTW 0.7837 (9/9) +++
;ADK66841.1_NA 116 NTTQ 0.7013 (9/9) ++
;ADK66841.1_NA 126 NGTH 0.7112 (9/9) ++
;ADK66841.1_NA 163 NCTY 0.7620 (9/9) +++
;ADK66841.1_NA 173 NIST 0.6556 (8/9) +
;ADK66841.1_NA 231 NITY 0.7442 (9/9) ++
;ADK66841.1_NA 273 NGTI 0.7109 (9/9) ++
;ADK66841.1_NA 322 NITQ 0.6116 (8/9) +
;ADK66841.1_NA 334 NITS 0.7296 (9/9) ++
;ADK66841.1_NA 361 NSSA 0.5388 (6/9) +
;ADK66841.1_NA 462 NPSG 0.4656 (5/9) -
;ADK66841.1_NA 541 NSTK 0.5883 (8/9) +
;ADK66841.1_NA 590 NASS 0.5643 (6/9) +
;ADK66841.1_NA 603 NCTD 0.7117 (9/9) ++
;ADK66841.1_NA 646 NSSY 0.5467 (4/9) +
;ADK66841.1_NA 665 NVSS 0.7980 (9/9) +++
;ADK66841.1_NA 695 NNTI 0.4537 (5/9) -
;ADK66841.1_NA 703 NFSI 0.5613 (9/9) ++
;ADK66841.1_NA 787 NFSQ 0.6209 (9/9) ++
;ADK66841.1_NA 1060 NFTT 0.4540 (6/9) -
;ADK66841.1_NA 1084 NGTH 0.5408 (6/9) +
;ADK66841.1_NA 1120 NNTV 0.5803 (6/9) +
;ADK66841.1_NA 1144 NHTS 0.3828 (8/9) -
;ADK66841.1_NA 1149 NVSL 0.4879 (5/9) -
;ADK66841.1_NA 1159 NASV 0.5021 (3/9) +
;ADK66841.1_NA 1180 NESL 0.5770 (7/9) + 输出
Criteria: potential >= 0.70
+-----------------------------+-------+
| Part | Count |
+-----------------------------+-------+
| ADK66841.1_NA | 9 |
| YP_003858584.1_BtCoVBM48_gp | 9 |
+-----------------------------+-------+注意:
您在GitHub上推荐我的文件不包括数据文件中的前导;。由于这个原因,数字字段的计数减少了1,导致没有得到任何结果。
请对perl脚本进行以下更改:
next unless @cols == 7;
my($id,$potential) = @cols[1,4];至
next unless @cols == 6;
my($id,$potential) = @cols[0,3];https://stackoverflow.com/questions/67271710
复制相似问题