我有两个csv文件:
档案(1栏)1:
Prokaryote,Caudovirales,Myoviridae
Prokaryote,Caudovirales,Podoviridae
Prokaryote,Caudovirales,Siphoviridae
Prokaryote,Ligamenvirales,Lipothrixviridae
Prokaryote,Ligamenvirales,Rudiviridae
Prokaryote,Unassigned,Ampullaviridae和文件2 (2栏):
NC_038375 Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867 Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866 Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929 Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166 Siphoviridae,,Bacillus_phage_SPP1
NC_005859 Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166 Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720 Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371 Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048 Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929 Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649 Podoviridae,Salasvirus,Bacillus_virus_GA1如果基于fname中的第三个名称匹配,我希望在第二列文件2中的每个值开始时从文件1中添加2个名。期望产出:
NC_038375 Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867 Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866 Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929 Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166 Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
NC_005859 Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166 Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720 Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371 Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048 Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929 Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649 Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1对此有帮助吗?
发布于 2019-06-13 10:00:09
假设对我上面的问题有好感,您可以在awk中这样做:
parse.awk
FNR == NR { # Only for the first file
h[$3] = $1 "," $2 # Collect column one and two into 'h' hash
next
}
{ split($2, a, ",") } # Split the second column of the second file to array 'a'
a[1] in h { # If the first element of the second column of the
$2 = h[a[1]] "," $2 # second file is in 'h' then prepend the value to $2
}
1 # Print all lines像这样运行:
awk -f parse.awk FS=',' file1 FS='\t' OFS='\t' file2输出:
NC_038375 Baculoviridae,Betabaculovirus,Trichoplusia_ni_granulovirus
NC_000867 Corticoviridae,Corticovirus,Pseudoalteromonas_virus_PM2
NC_000866 Prokaryote,Caudovirales,Myoviridae,Tequatrovirus,Escherichia_virus_T4
NC_000929 Prokaryote,Caudovirales,Myoviridae,Muvirus,Escherichia_virus_Mu
NC_004166 Prokaryote,Caudovirales,Siphoviridae,,Bacillus_phage_SPP1
NC_005859 Prokaryote,Caudovirales,Siphoviridae,Tequintavirus,Escherichia_virus_T5
NC_002166 Prokaryote,Caudovirales,Siphoviridae,Hendrixvirus,Escherichia_virus_HK022
NC_008720 Prokaryote,Caudovirales,Podoviridae,Enquatrovirus,Escherichia_virus_N4
NC_002371 Prokaryote,Caudovirales,Podoviridae,Lederbergvirus,Salmonella_virus_P22
NC_011048 Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_phi29
NNC_001929 Geminiviridae,Begomovirus,Abutilon_mosaic_virus
NC_002649 Prokaryote,Caudovirales,Podoviridae,Salasvirus,Bacillus_virus_GA1https://unix.stackexchange.com/questions/524626
复制相似问题