首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如果有匹配,则从file1添加一部分值到文件2中的列

如果有匹配,则从file1添加一部分值到文件2中的列
EN

Unix & Linux用户
提问于 2019-06-13 09:36:43
回答 1查看 35关注 0票数 0

我有两个csv文件:

档案(1栏)1:

代码语言:javascript
复制
Prokaryote,Caudovirales,Myoviridae
Prokaryote,Caudovirales,Podoviridae
Prokaryote,Caudovirales,Siphoviridae
Prokaryote,Ligamenvirales,Lipothrixviridae
Prokaryote,Ligamenvirales,Rudiviridae
Prokaryote,Unassigned,Ampullaviridae

和文件2 (2栏):

代码语言:javascript
复制
NC_038375   Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867   Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866   Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929   Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166   Siphoviridae,,Bacillus_phage_SPP1
NC_005859   Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166   Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720   Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371   Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048   Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929  Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649   Podoviridae,Salasvirus,Bacillus_virus_GA1

如果基于fname中的第三个名称匹配,我希望在第二列文件2中的每个值开始时从文件1中添加2个名。期望产出:

代码语言:javascript
复制
NC_038375   Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
    NC_000867   Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
    NC_000866   Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
    NC_000929   Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
    NC_004166   Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
    NC_005859   Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
    NC_002166   Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
    NC_008720   Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
    NC_002371   Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
    NC_011048   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
    NNC_001929  Geminiviridae,Begomovirus,Abutilon_mosaic_virus
    NC_002649   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1

对此有帮助吗?

EN

回答 1

Unix & Linux用户

发布于 2019-06-13 10:00:09

假设对我上面的问题有好感,您可以在awk中这样做:

parse.awk

代码语言:javascript
复制
FNR == NR {              # Only for the first file
  h[$3] = $1 "," $2      # Collect column one and two into 'h' hash
  next
}

{ split($2, a, ",") }    # Split the second column of the second file to array 'a'

a[1] in h {              # If the first element of the second column of the 
  $2 = h[a[1]] "," $2    # second file is in 'h' then prepend the value to $2
}

1                        # Print all lines

像这样运行:

代码语言:javascript
复制
awk -f parse.awk FS=',' file1 FS='\t' OFS='\t' file2

输出:

代码语言:javascript
复制
NC_038375   Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867   Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866   Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929   Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166   Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
NC_005859   Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166   Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720   Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371   Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929  Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649   Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1
票数 1
EN
页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://unix.stackexchange.com/questions/524626

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档