我有两个大文件(超过1000行)。
档案-1
head File-1
1_10 PL14
1_13 GH13
13_12 GH20
13_137 GH10
13_35 GT19
14_128 GH36
14_131 GH42
14_65 GH109
15_28 GT30
15_30 GH13
16_3 CE1档案-2
head File-2
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054 450
1_10 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
1_11 0.012 0.014 1.739 0 0 0 0.0237 171
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102 357
1_13 0.035 0.01 3.836 0 0 0 0.103 234
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082 1125我想映射文件-1与文件-2,以获得不打印最后一列从文件-2。如果我能学到输出作为输出1和输出2,那就更好了。
产出-1
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054 450
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
1_11 0.012 0.014 1.739 0 0 0 0.0237 171
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102 357
GH13 0.035 0.01 3.836 0 0 0 0.103 234
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082 1125输出-2(未打印未映射的行)
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam gene_name
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072 2055
GH13 0.035 0.01 3.836 0 0 0 0.103 234我试过:
awk '
NR==FNR {
a[$1]=$2
next
}
{
print (($1 in a)?a[$1]:$1, $2, $3, $4, $5,$6, $7, $8)
}' File-1 File-2 > Output但是输出只显示了File-2的内容。
请更正我的awk代码或任何其他建议(sed,Perl)。
发布于 2022-06-28 13:55:13
awk '
NR==FNR{ # process File1
a[$1]=$2; # map File1 columns
next # next line
}
{ # process File2
NF-- # delete last column
}
FNR==1{ # first line from File2
print > "Output1"; # write header to Output1/2
print > "Output2";
next # next line
}
!($1 in a){ # mapped false
print > "Output1" # write unmapped to Output1
}
($1 in a){ # mapped true
$1=a[$1]; # modify $1 and write mapped to Output1/2
print > "Output2";
print > "Output1"
}' File1 File2
$ head Output1 Output2
==> Output1 <==
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam
1_1 0.069 0.0169 2.826 0 0.004 0.019 0.054
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072
1_11 0.012 0.014 1.739 0 0 0 0.0237
1_12 0.082 0.071 3.763 0.021 0 0.014 0.102
GH13 0.035 0.01 3.836 0 0 0 0.103
1_14 0.054 0.031 2.844 0.006 0.005 0.001 0.082
==> Output2 <==
gene_id HK.1.bam HK.2.bam HK.Hu.bam HKSW.bam UHK.1.bam UHK.2.bam UHK.Hu.1.bam
PL14 0.030 0.016 2.114 0 0.001 0.000 0.072
GH13 0.035 0.01 3.836 0 0 0 0.103 https://stackoverflow.com/questions/72783692
复制相似问题