首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如果目录中的文件有不同的长度,如何修改这段awk操作?

如果目录中的文件有不同的长度,如何修改这段awk操作?
EN

Stack Overflow用户
提问于 2015-11-17 23:26:13
回答 1查看 46关注 0票数 1

我必须捕获目录中所有文件的第9行,并形成一个矩阵。因此,最终的结果矩阵文件应该包含该目录中每个列的最后一列(第9列)。我正在尝试使用下面的awk命令,但似乎每列的行数都是固定的。但是每列应该有不同的行长。

我怎么才能做到这一点?

所以我创建了for文件,每个文件都有不同的行长

test1

代码语言:javascript
复制
chr9    9335447 9336484 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56667 965 .   6.77363 99.72431    96.55273
chr9    25458602    25460996    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56963 965 .   8.69480 99.62462    96.50636
chr9    8951218 8952614 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56664 924 .   7.08373 95.87063    92.42281
chr9    25488217    25488493    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56969 924 .   7.26997 95.93935    92.40503
chr9    21767169    21767424    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56851 917 .   7.09383 95.08205    91.76054
chr9    25462036    25463421    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56964 913 .   8.10742 94.20728    91.34667
chr9    23376300    23376656    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56879 909 .   7.68991 95.23603    90.97657
chr9    6248051 6249845 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56643 902 .   7.23016 93.55791    90.20087
chr9    4361536 4366373 /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_56640 901 .   7.10611 93.39889    90.18221
chr10   82292632    82292885    /data/GT/polycomb_project/macs2.1_out/52_macs2.1_out_peak_7864  99  .   4.77657 11.76769    9.98589

test2

代码语言:javascript
复制
chr9    20992529    20993162    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_44714    99  .   1.81800 11.61118    9.99610
chr10   10150857    10152503    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_2529 99  .   5.72519 11.92312    9.99364
chr6    99944290    99968054    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37835    99  .   5.14794 11.83886    9.99173
chr9    21676437    21677033    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_44723    99  .   2.08915 11.74377    9.98114
chr15   54971489    54971789    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_15937    99  .   4.81810 12.49836    9.97776
chr12   82402758    82403588    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_11103    99  .   5.50341 12.17826    9.97621
chr10   10027428    10028675    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_2523 99  .   5.27050 11.61293    9.97230
chr5    121116263   121117610   /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_34950    99  .   5.77439 11.93674    9.96821
chr6    85524028    85524890    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37188    99  .   5.96862 12.05430    9.96497
chr3    35946879    35947188    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_28382    99  .   6.24135 12.21292    9.96319
chr15   97759104    97761134    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_17206    99  .   5.14599 11.60535    9.95046
chr10   82999870    83001905    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_4560 99  .   5.13890 11.74480    9.94890
chr6    89132010    89133523    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_37413    99  .   5.07187 11.44838    9.94713
chr10   41146219    41147420    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_3185 99  .   5.69149 12.04643    9.94077
chr16   23430991    23431625    /data/GT/polycomb_project/macs2.1_out/PT4_macs2.1_out_peak_18080    99  .   5.69959 11.88507    9.93962

test3

代码语言:javascript
复制
chr10   79678402    79678978    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4211 99  .   5.12172 12.22297    9.99310
chr6    91782996    91785061    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35775    99  .   5.44415 11.81448    9.99213
chr6    87337478    87340150    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35453    99  .   5.63817 11.98290    9.99051
chr1    53794676    53795323    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_257  99  .   6.10605 12.20900    9.98874
chr11   5986806 5987478 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_5727 99  .   6.43022 12.47342    9.97663
chr6    121282251   121282898   /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_36549    99  .   5.21404 12.05700    9.96515
chr10   75631023    75636021    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4005 99  .   5.22504 11.71938    9.95342
chr18   66115872    66117662    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_21569    99  .   5.24544 11.71402    9.95194
chr19   44632939    44635029    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_23305    99  .   4.50809 11.79674    9.94865
chr4    14764961    14765707    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_29038    99  .   5.76862 11.99986    9.94749
chr5    141067881   141068891   /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_33753    99  .   4.88940 11.44856    9.93749
chr10   70650648    70650887    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_3871 99  .   3.28463 11.70058    9.91189
chr6    85478303    85479428    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35363    99  .   5.67223 11.88624    9.90526
chr5    24227460    24228790    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_31250    99  .   5.34013 11.81155    9.90311
chr6    87217355    87217671    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35445    98  .   4.94135 11.84741    9.89441
chr19   56822146    56823187    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_23618    98  .   4.19924 11.60634    9.89441
chr5    34353383    34353813    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_31669    98  .   5.41412 11.69552    9.89124
chr8    128343980   128344400   /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_42552    98  .   5.88042 12.31357    9.88999
chr7    120101715   120103357   /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_39838    98  .   5.04756 11.80500    9.88873
chr13   32095516    32096121    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_11938    98  .   5.58792 11.86071    9.88852

test4

代码语言:javascript
复制
chr10   79678402    79678978    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_4211 99  .   5.12172 12.22297    9.99310
chr6    91782996    91785061    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35775    99  .   5.44415 11.81448    9.99213
chr6    87337478    87340150    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_35453    99  .   5.63817 11.98290    9.99051
chr1    53794676    53795323    /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_257  99  .   6.10605 12.20900    9.98874
chr11   5986806 5987478 /data/GT/polycomb_project/macs2.1_out/PT1_macs2.1_out_peak_5727 99  .   6.43022 12.47342    9.97663

我使用的命令

代码语言:javascript
复制
awk -v OFS='\t' '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $9 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls test*) > test_out.txt

我从上面的代码中得到了输出,但这并不是我想要的

代码语言:javascript
复制
96.55273 9.99610 9.99310 9.99310
96.50636 9.99364 9.99213 9.99213
92.42281 9.99173 9.99051 9.99051
92.40503 9.98114 9.98874 9.98874
91.76054 9.97776 9.97663 9.97663

我期望根据每个文件文件的行长,输出应该有不同的行。

EN

回答 1

Stack Overflow用户

发布于 2015-11-18 00:06:52

问题是END块中的FNR只是最后一个文件中最后一条记录的记录号。这就是为什么你的输出只有和你最后一个文件一样多的行数。您需要保存所有文件中所有FNR的最大值,并在END块的循环中使用它。

类似于:

代码语言:javascript
复制
$ awk '{ if (max < FNR) max = FNR; a[FNR] = (FNR in a ? a[FNR] FS : "") $9 } 
       END { for(i=1;i<=max;i++) print a[i] }' test*

96.55273 9.99610 9.99310 9.99310  
96.50636 9.99364 9.99213 9.99213  
92.42281 9.99173 9.99051 9.99051  
92.40503 9.98114 9.98874 9.98874  
91.76054 9.97776 9.97663 9.97663  
91.34667 9.97621 9.96515
90.97657 9.97230 9.95342
90.20087 9.96821 9.95194
90.18221 9.96497 9.94865
9.98589 9.96319 9.94749
9.95046 9.93749
9.94890 9.91189
9.94713 9.90526
9.94077 9.90311
9.93962 9.89441
9.89441
9.89124
9.88999
9.88873
9.88852

使用gawk,您可以让它(可以说)更优雅一些,因为它有一个ENDFILE块,在处理每个文件后只调用一次:

代码语言:javascript
复制
$ gawk '{ a[FNR] = (FNR in a ? a[FNR] FS : "") $9 } 
        ENDFILE { if (max < FNR) max = FNR } 
        END { for(i=1;i<=max;i++) print a[i] }' test*

96.55273 9.99610 9.99310 9.99310  
96.50636 9.99364 9.99213 9.99213  
92.42281 9.99173 9.99051 9.99051  
92.40503 9.98114 9.98874 9.98874  
91.76054 9.97776 9.97663 9.97663  
91.34667 9.97621 9.96515
90.97657 9.97230 9.95342
90.20087 9.96821 9.95194
90.18221 9.96497 9.94865
9.98589 9.96319 9.94749
9.95046 9.93749
9.94890 9.91189
9.94713 9.90526
9.94077 9.90311
9.93962 9.89441
9.89441
9.89124
9.88999
9.88873
9.88852
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33760677

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档