我的数据是这样的:
file Gibbs kcal rel pop pop2
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218我想找出第6栏的和,然后将第6栏中的每个值除以该和,然后在一个名为“加权”的新列中打印这些值。
使用
echo "weighted" >> allRE7
awk 'NR==FNR{sum+= $6; next}{printf("%0.4f\n", $6/sum)}' input input >> out
paste input out >> final给我
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.0000
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.4590
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.3622
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0894
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0689
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0104
0.0100我不知道0.0100是从哪里来的。
发布于 2019-07-29 21:42:35
问题是,awk代码被打印为标题行的加权结果。为消除这一现象,请取代:
awk 'NR==FNR{sum+= $6; next}{printf("%0.4f\n", $6/sum)}' input input >> out通过以下方式:
awk 'NR==FNR{sum+= $6; next} FNR>1{printf("%0.4f\n", $6/sum)}' input input >> outFNR>1条件确保只为数据行打印$6/sum。
改进
echo和paste命令是不必要的。尝试:
$ awk 'NR==FNR{sum+= $6; next} FNR==1{print $0,"weighted"; next} {printf("%s %0.4f\n",$0,$6/sum)}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100上面的变体使用了一个三元运算符(hat技巧:Ed Morton),:
$ awk 'NR==FNR{sum+= $6; next} {print $0, (FNR>1 ? sprintf("%0.4f",$6/sum) : "weighted")}' input input
file Gibbs kcal rel pop pop2 weighted
RR2.out -1752.142111 -1099486.696073 0.000000 -0.0000 1.0000 0.4590
RR1.out -1752.141887 -1099486.555511 0.140562 -0.2374 0.7891 0.3622
RR4.out -1752.140564 -1099485.725315 0.970758 -1.6398 0.1947 0.0894
RR3.out -1752.140319 -1099485.571575 1.124498 -1.8995 0.1502 0.0689
RR5.out -1752.138532 -1099484.450215 2.245858 -3.7937 0.0227 0.0104
RR6.out -1752.138493 -1099484.425742 2.270331 -3.8351 0.0218 0.0100发布于 2019-07-29 21:49:20
你也在计算标题线的平均值。
要省略标题行,您的awk脚本应该是:
awk 'FNR==1{next}NR==FNR{sum+= $6; next}{printf("%0.4f\n", $6/sum)}' input input >> out
paste input out >> final 一个更干净的awk脚本(包括paste命令)是:
awk 'FNR==1{next}NR==FNR{sum+= $6; next}{printf("%s %0.4f\n", $0, $6/sum)}' input inputhttps://stackoverflow.com/questions/57261625
复制相似问题