文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在列中转换唯一项并从另一列打印值

问如何在列中转换唯一项并从另一列打印值
EN

Stack Overflow用户

提问于 2022-11-09 04:09:17

回答 2查看 65关注 0票数 0

我有一个带有两列的文件input.txt，我希望将第二列拆分为";“，并将唯一的条目转接起来，然后计数并列出第1列中有多少匹配项。

这是我的标签分隔的input.txt文件

Gene     Biological_Process
BALF2   metabolic process
CHD4    cell organization and biogenesis;metabolic process;regulation of biological process
TCOF1   cell organization and biogenesis;regulation of biological process;transport
TOP1    cell death;cell division;cell organization and biogenesis;metabolic process;regulation of biological process;response to stimulus
BcLF1   0
BALF5   metabolic process
MTA2    cell organization and biogenesis;metabolic process;regulation of biological process
MSH6    cell organization and biogenesis;metabolic process;regulation of biological process;response to stimulus

我期待的output1

Biological_Process  Gene
metabolic process   BALF2   CHD4    TOP1    BALF5   MTA2    MSH6
cell organization and biogenesis    CHD4    TCOF1   TOP1    MTA2    MSH6
regulation of biological process    CHD4    TCOF1   TOP1    MTA2    MSH6
transport   TCOF1
cell death  TOP1
cell division   TOP1
response to stimulus    TOP1    MSH6

python

awk

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-11-11 21:06:17

$ cat script.awk 
#! /usr/bin/awk -f 

BEGIN {
    FS = "[\t;]";  # sep can be a regex
    OFS = "\t"
}

NR>1 && /^[A-Z]/{  # skip header & blank lines 
    for(i=NF; i>1; i--)
        if($i)   # skip empty bio-proc
           a[$i] = a[$i] OFS $1 
}
END{
    print "Biological_Process","Gene(s)"
    for(x in a)
        print x a[x] 
}

$ ./script.awk input.dat 
Biological_Process  Gene(s)
cell death  TOP1
regulation of biological process    CHD4    TCOF1   TOP1    MTA2    MSH6
transport   TCOF1
cell division   TOP1
metabolic process   BALF2   CHD4    TOP1    BALF5   MTA2    MSH6
response to stimulus    TOP1    MSH6
cell organization and biogenesis    CHD4    TCOF1   TOP1    MTA2    MSH6

票数 2

Stack Overflow用户

发布于 2022-11-09 05:29:06

您需要首先解析所有数据，例如，从空白字典开始，然后读取文件的每一行(如果是头) open your file ... iterate over each line，对于列>0中的每一个条目，使用split、strip和dict[gene...] = process...等字符串方法为该字符串创建一个字典键，其值为column =0。然后从dict中打印/写出每个.items：

input.txt

gene process
A cell org bio
B cell bio
C 0
D org

script.py

#!/usr/bin/env python

def main():

    pros = {}

    with open("input.txt", "r") as ifile:
        for line in ifile:
            cols = line.strip().split()
            if len(cols) >= 1:
                for pro in cols[1:]:
                    if pro not in pros:
                        pros[pro] = []
                    pros[pro] += [cols[0]]

    with open("output.txt", "w") as ofile:
        for key,val in pros.items():
            ofile.writelines(f'{key}\t' + '\t'.join(val) + '\n')

if __name__ == "__main__":
    main()

跑

$ chmod +x ./script.py
$ ./script.py
$ cat ./output.txt

output.txt

process gene
cell    A       B
org     A       D
bio     A       B
0       C

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74369833

复制

相似问题

问如何在列中转换唯一项并从另一列打印值
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在列中转换唯一项并从另一列打印值EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在列中转换唯一项并从另一列打印值
EN