我有多个.fasta文件(命名为barcode*_consensus.fasta),如下所示:
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT我想重复/重复每两行n次,在“总支持读”之后指定。例如,我想重复前两行12次,后两行重复6次,等等。
使用awk时,我确实选择了以'>‘开头的每一行,并选择了下一行:
awk '/>/{nr[NR]; nr[NR+1]} NR in nr' barcode01_consensus.fasta但我找不出如何用变量打印这n次。
任何帮助都是非常感谢的。
更新:所以我希望最后的文件看起来类似于:
|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 |>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000 |>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
....x 12次.
发布于 2021-06-21 16:06:28
我怀疑有一个正确的生物信息学工具包,它可以更有力地实现您想要的功能,但是使用GNU awk,您可以匹配并捕获重复数,如
gawk '
/^>?[|]>/ {
if((getline seq) > 0) $0 = $0 ORS seq
}
match($0,/total_supporting_reads_([0-9]+)/,a) {
while(a[1]--) print
}
' file.fasta您也可以对非GNU进行同样的操作,但您需要使用RSTART和RMATCH值作为子字符串提取匹配的数字部分,而不是在数组中捕获它。
发布于 2021-06-21 14:32:38
我会使用空格或下划线作为字段分隔符。然后,计数是第8个字段:
awk -F'[ _]' '
$1 ~ /[>|]+consensus$/ {n = $8; print; next}
{while (--n >= 0) print}
' file输出
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT要打印每一对n行,只需做一些微小的更改:
awk -F'[ _]' '
$1 ~ /[>|]+consensus$/ {firstline = $0; n = $8; next}
{while (--n >= 0) print firstline ORS $0}
' file输出
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
>|>consensus_cl_id_1018_total_supporting_reads_12 LN:i:1369 RC:i:12 XC:f:1.000000
TCATTAACCACAAAGTGGTGAGCGTTCTCCCGAAGGTTAAACTACCCACTTCTTTTGCAGCCAACTCCCATGGTGTGACGGG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_107_total_supporting_reads_6 LN:i:1440 RC:i:6 XC:f:1.000000
GACTTCAGCCCAGTCATTAGTCCTACCATGGACCCCCATATTACTAGAGGAGCTTCCGATATTACTAACTCCCATGCCGTGACGGGCG
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAT
|>consensus_cl_id_116_total_supporting_reads_5 LN:i:1314 RC:i:558 XC:f:1.000000
AGAACGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGCTACCTTCGGGGGAGCGGCGGACGGGTTAGTAACGCGTGGGAATAThttps://askubuntu.com/questions/1347256
复制相似问题