我必须捕获目录中所有文件的第9行,并形成一个矩阵。因此,最终的结果矩阵文件应该包含该目录中每个列的最后一列(第9列)。我正在尝试使用下面的awk命令,但似乎每列的行数都是固定的。但是每列应该有不同的行长。
我怎么才能做到这一点?
所以我创建了for文件,每个文件都有不同的行长
test1
chr9 9335447 9336484 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56667 965 . 6.77363 99.72431 96.55273
chr9 25458602 25460996 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56963 965 . 8.69480 99.62462 96.50636
chr9 8951218 8952614 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56664 924 . 7.08373 95.87063 92.42281
chr9 25488217 25488493 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56969 924 . 7.26997 95.93935 92.40503
chr9 21767169 21767424 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56851 917 . 7.09383 95.08205 91.76054
chr9 25462036 25463421 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56964 913 . 8.10742 94.20728 91.34667
chr9 23376300 23376656 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56879 909 . 7.68991 95.23603 90.97657
chr9 6248051 6249845 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56643 902 . 7.23016 93.55791 90.20087
chr9 4361536 4366373 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56640 901 . 7.10611 93.39889 90.18221
chr10 82292632 82292885 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_7864 99 . 4.77657 11.76769 9.98589test2
chr9 20992529 20993162 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_44714 99 . 1.81800 11.61118 9.99610
chr10 10150857 10152503 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_2529 99 . 5.72519 11.92312 9.99364
chr6 99944290 99968054 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37835 99 . 5.14794 11.83886 9.99173
chr9 21676437 21677033 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_44723 99 . 2.08915 11.74377 9.98114
chr15 54971489 54971789 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_15937 99 . 4.81810 12.49836 9.97776
chr12 82402758 82403588 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_11103 99 . 5.50341 12.17826 9.97621
chr10 10027428 10028675 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_2523 99 . 5.27050 11.61293 9.97230
chr5 121116263 121117610 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_34950 99 . 5.77439 11.93674 9.96821
chr6 85524028 85524890 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37188 99 . 5.96862 12.05430 9.96497
chr3 35946879 35947188 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_28382 99 . 6.24135 12.21292 9.96319
chr15 97759104 97761134 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_17206 99 . 5.14599 11.60535 9.95046
chr10 82999870 83001905 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_4560 99 . 5.13890 11.74480 9.94890
chr6 89132010 89133523 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37413 99 . 5.07187 11.44838 9.94713
chr10 41146219 41147420 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_3185 99 . 5.69149 12.04643 9.94077
chr16 23430991 23431625 /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_18080 99 . 5.69959 11.88507 9.93962test3
chr10 79678402 79678978 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4211 99 . 5.12172 12.22297 9.99310
chr6 91782996 91785061 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35775 99 . 5.44415 11.81448 9.99213
chr6 87337478 87340150 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35453 99 . 5.63817 11.98290 9.99051
chr1 53794676 53795323 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_257 99 . 6.10605 12.20900 9.98874
chr11 5986806 5987478 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_5727 99 . 6.43022 12.47342 9.97663
chr6 121282251 121282898 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_36549 99 . 5.21404 12.05700 9.96515
chr10 75631023 75636021 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4005 99 . 5.22504 11.71938 9.95342
chr18 66115872 66117662 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_21569 99 . 5.24544 11.71402 9.95194
chr19 44632939 44635029 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_23305 99 . 4.50809 11.79674 9.94865
chr4 14764961 14765707 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_29038 99 . 5.76862 11.99986 9.94749
chr5 141067881 141068891 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_33753 99 . 4.88940 11.44856 9.93749
chr10 70650648 70650887 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_3871 99 . 3.28463 11.70058 9.91189
chr6 85478303 85479428 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35363 99 . 5.67223 11.88624 9.90526
chr5 24227460 24228790 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_31250 99 . 5.34013 11.81155 9.90311
chr6 87217355 87217671 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35445 98 . 4.94135 11.84741 9.89441
chr19 56822146 56823187 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_23618 98 . 4.19924 11.60634 9.89441
chr5 34353383 34353813 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_31669 98 . 5.41412 11.69552 9.89124
chr8 128343980 128344400 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_42552 98 . 5.88042 12.31357 9.88999
chr7 120101715 120103357 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_39838 98 . 5.04756 11.80500 9.88873
chr13 32095516 32096121 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_11938 98 . 5.58792 11.86071 9.88852test4
chr10 79678402 79678978 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4211 99 . 5.12172 12.22297 9.99310
chr6 91782996 91785061 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35775 99 . 5.44415 11.81448 9.99213
chr6 87337478 87340150 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35453 99 . 5.63817 11.98290 9.99051
chr1 53794676 53795323 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_257 99 . 6.10605 12.20900 9.98874
chr11 5986806 5987478 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_5727 99 . 6.43022 12.47342 9.97663我使用的命令
awk -v OFS='\t' '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $9 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls test*) > test_out.txt我从上面的代码中得到了输出,但这并不是我想要的
96.55273 9.99610 9.99310 9.99310
96.50636 9.99364 9.99213 9.99213
92.42281 9.99173 9.99051 9.99051
92.40503 9.98114 9.98874 9.98874
91.76054 9.97776 9.97663 9.97663我期望根据每个文件文件的行长,输出应该有不同的行。
发布于 2015-11-18 00:06:52
问题是END块中的FNR只是最后一个文件中最后一条记录的记录号。这就是为什么你的输出只有和你最后一个文件一样多的行数。您需要保存所有文件中所有FNR的最大值,并在END块的循环中使用它。
类似于:
$ awk '{ if (max < FNR) max = FNR; a[FNR] = (FNR in a ? a[FNR] FS : "") $9 }
END { for(i=1;i<=max;i++) print a[i] }' test*
96.55273 9.99610 9.99310 9.99310
96.50636 9.99364 9.99213 9.99213
92.42281 9.99173 9.99051 9.99051
92.40503 9.98114 9.98874 9.98874
91.76054 9.97776 9.97663 9.97663
91.34667 9.97621 9.96515
90.97657 9.97230 9.95342
90.20087 9.96821 9.95194
90.18221 9.96497 9.94865
9.98589 9.96319 9.94749
9.95046 9.93749
9.94890 9.91189
9.94713 9.90526
9.94077 9.90311
9.93962 9.89441
9.89441
9.89124
9.88999
9.88873
9.88852使用gawk,您可以让它(可以说)更优雅一些,因为它有一个ENDFILE块,在处理每个文件后只调用一次:
$ gawk '{ a[FNR] = (FNR in a ? a[FNR] FS : "") $9 }
ENDFILE { if (max < FNR) max = FNR }
END { for(i=1;i<=max;i++) print a[i] }' test*
96.55273 9.99610 9.99310 9.99310
96.50636 9.99364 9.99213 9.99213
92.42281 9.99173 9.99051 9.99051
92.40503 9.98114 9.98874 9.98874
91.76054 9.97776 9.97663 9.97663
91.34667 9.97621 9.96515
90.97657 9.97230 9.95342
90.20087 9.96821 9.95194
90.18221 9.96497 9.94865
9.98589 9.96319 9.94749
9.95046 9.93749
9.94890 9.91189
9.94713 9.90526
9.94077 9.90311
9.93962 9.89441
9.89441
9.89124
9.88999
9.88873
9.88852https://stackoverflow.com/questions/33760677
复制相似问题