首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在文本(fasta)文件中向下移动新行字符5位置

在文本(fasta)文件中向下移动新行字符5位置
EN

Stack Overflow用户
提问于 2022-04-05 16:53:42
回答 4查看 65关注 0票数 1

我正在尝试转换这样的文本文件(fasta格式):

代码语言:javascript
复制
>seq1
AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
ATGATGATGGAATGAGGAT
TTAGGAGGGAGGAAAATTC
>seq2
CCCTCCGGGAAAAAAGAGG
TTGCAATGCGCGTATTTAT
TTTTTTTTTTTTTTTTTTT
AAAAAAAAAAAAAGGCTGT
AAAAAAAAAAAAAAAGGGG

目标是在下游替换换行符5位置,但以>开头的行除外。

代码语言:javascript
复制
>seq1
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAATGATGATGGAATGA
GGATTTAGGAGGGAGGAAAATTC
>seq2
CCCTCCGGGAAAAAAGAGGTTGCA
ATGCGCGTATTTATTTTTTTTTTT
TTTTTTTTTAAAAAAAAAAAAAGG
CTGTAAAAAAAAAAAAAAAGGGG

我想使用AWK,但我不知道如何进行。我在想一些类似的事情:

代码语言:javascript
复制
awk '{for(i=1;i<=NR;i++){ if($1 ~ /^>/){¿?¿?¿?}}}'

你知道我该怎么解决这个问题吗?

EN

回答 4

Stack Overflow用户

回答已采纳

发布于 2022-04-05 19:09:50

假设:

  • 所有数据行将扩展到最多24个字符的

awk的一个想法是:

代码语言:javascript
复制
awk -v width=24 '                               # pass width in as awk variable "width"
function print_sequence() {
    if (sequence)                               # if sequence is not blank
       while (sequence) {                       # while sequence is not blank
             print substr(sequence,1,width)     # print 1st 24 characters
             sequence=substr(sequence,width+1)  # remove 1st 24 characters
       }
}

/^>/ { print_sequence()                         # flush previous set of data to stdout
       print                                    # print current input line
       next                                     # process next input line
     }
     { sequence=sequence $1 }                   # append data to our "sequence" variable

END  { print_sequence() }                       # flush last set of data to stdout
' fasta.in > fasta.out

这就产生了:

代码语言:javascript
复制
$ cat fasta.out
>seq1
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAATGATGATGGAATGA
GGATTTAGGAGGGAGGAAAATTC
>seq2
CCCTCCGGGAAAAAAGAGGTTGCA
ATGCGCGTATTTATTTTTTTTTTT
TTTTTTTTTAAAAAAAAAAAAAGG
CTGTAAAAAAAAAAAAAAAGGGG
票数 1
EN

Stack Overflow用户

发布于 2022-04-05 19:04:32

我会这样做,让file.txt内容

代码语言:javascript
复制
>seq1
AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
ATGATGATGGAATGAGGAT
TTAGGAGGGAGGAAAATTC
>seq2
CCCTCCGGGAAAAAAGAGG
TTGCAATGCGCGTATTTAT
TTTTTTTTTTTTTTTTTTT
AAAAAAAAAAAAAGGCTGT
AAAAAAAAAAAAAAAGGGG

然后

代码语言:javascript
复制
awk 'BEGIN{width=24}/>/&&x{print x;x=""}/>/{print;next}{x = x $0}length(x)>=width{print substr(x,1,width);x=substr(x,width+1)}END{print x}' file.txt

给予输出

代码语言:javascript
复制
>seq1
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAATGATGATGGAATGA
GGATTTAGGAGGGAGGAAAATTC
>seq2
CCCTCCGGGAAAAAAGAGGTTGCA
ATGCGCGTATTTATTTTTTTTTTT
TTTTTTTTTAAAAAAAAAAAAAGG
CTGTAAAAAAAAAAAAAAAGGGG

说明:我将宽度设置为24,这是所需字符的数量,如果找到了>,并且x中存储了一些内容,请打印该字符,并将x值设置为空字符串,如果遇到>行,请打印它并转到下一行。对于每一行,请将当前行内容附加到x中,如果length of x等于或大于width,请先打印x的width字符,然后从x中删除这些字符。在处理完所有行后,请打印x。免责声明解决方案:此解决方案假定当前宽度与所需宽度之间的比率小于0.5

(GNU Awk 5.0.1)

票数 1
EN

Stack Overflow用户

发布于 2022-04-05 20:00:27

您还可以尝试另一种方法,使用awk的字段和记录分隔符:

代码语言:javascript
复制
awk -v width=24 '
  BEGIN {
    FS="\n"                            # Set the Field separator to newline
    RS=">"                             # Set the Record separator to ">"
    ORS=OFS=""                         # Set the Output Record and Field separator to an empty string
  }

  NR>1 {                               # Using ">" as a record separator the first record is empty, so skip
    header=$1                          # Using "\n" as the Field separator, $1 contains the header, save it in a variable
    $1=OFS                             # Assign an empty string to $1 so the record gets recalculated and the body becomes $0 i
                                       # with all newlines are removed, since OFS == ""
    gsub(".{" width "}", "&" FS)       # Append every "width" characters with a newline (FS)
    print RS header FS $0 FS           # Print a ">", the header, a newline, the body and a newline
  }
' fasta_in > fasta_out
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71755614

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档