文章/答案/技术大牛

发布

问逐行重新订购或排序
EN

Stack Overflow用户

提问于 2020-11-25 03:23:02

回答 1查看 62关注 0票数 1

我有一个基因组坐标文件，结构如下：

chromosome1|25000|35000_chromosome1|400|600
chromosome4|78000|80000_chromosome2|43000|45000

我想对每条线上的两个条目进行排序，首先按照较低的基因组坐标排序，如果它们属于同一条染色体(例如，第1行)，或者首先按照数目较低的染色体排序，如果它们位于不同的染色体上。期望产出：

chromosome1|400|600_chromosome1|25000|35000
chromosome2|43000|45000_chromosome4|78000|80000

我试过以下几种方法，但奇怪的是，它并不总是正确工作！

cat file | awk 'BEGIN{OFS="\t"}{split($1,a,"_chr"); a[2]="chr" a[2]; str=$1; if(a[1]>a[2]) str=a[2]"_"a[1]; print str,$2}'

有人能帮忙吗？提前谢谢！

awk

bash

sorting

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-25 04:01:37

请您试一试：

awk 'BEGIN {FS = OFS = "_"}                # use "_" as a delimiter
{
    split($1, a, "\\|")                    # split left genomic coordinates with "|" and assign array "a"
    split($2, b, "\\|")                    # split right genomic coordinates with "|" and assign array "b"
    if (a[1] == b[1]) {                    # if they belong to the same chromosome
        if (a[2] < b[2]) print $1, $2      # then compare lower genomic coordinates
        else print $2, $1
    } else {                               # they belong to different chromosomes
        sub(/^[^0-9]+/, "", a[1])          # extract chromosome number and overwrite a[1]
        sub(/^[^0-9]+/, "", b[1])          # extract chromosome number and overwrite b[1]
        if (a[1]+0 < b[1]+0) print $1, $2  # then compare the numbers
        else print $2, $1
    }
}' file

给定示例文件的输出：

chromosome1|400|600_chromosome1|25000|35000
chromosome2|43000|45000_chromosome4|78000|80000

票数 4

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64998080

复制

相似问题

问逐行重新订购或排序
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问逐行重新订购或排序EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问逐行重新订购或排序
EN