使用awk计算列的平均值和标准差
我有这个档案:使用awk计算列的平均值和标准差,awk,mean,standard-deviation,Awk,Mean,Standard Deviation,我有这个档案: Took: 15.473214149475098 seconds Took: 12.94953465461731 seconds Took: 2.235722780227661 seconds Took: 40.53083419799805 seconds Took: 21.840606212615967 seconds Took: 35.777870893478394 seconds Took: 13.153780221939087 seconds
Took: 15.473214149475098 seconds
Took: 12.94953465461731 seconds
Took: 2.235722780227661 seconds
Took: 40.53083419799805 seconds
Took: 21.840606212615967 seconds
Took: 35.777870893478394 seconds
Took: 13.153780221939087 seconds
Took: 2.966165781021118 seconds
Took: 35.54965615272522 seconds
我想直接在终端中计算时间的平均值和标准差。能awk
帮助吗?我对它不太熟悉。我尝试拆分文件,以仅通过以下方式获得具有数值的列:
cat | awk-F”接受:{print$2}
但它只返回了文件的全部内容。请尝试下面的方法获取第二列的平均值
awk '{sum+=$2;if($2){count++}} END{print sum/count}' Input_file
编辑:
关于标准偏差的讨论有一个有趣的部分。特别令人感兴趣的是简单且数值稳定的:
其中,在每一步,A_k
等于运行平均值,Q_k
通过关系Q_k=σ²*k
与总体方差σ²相关
有了这样的理论背景,我们就可以写作了
$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
使用Perl一行程序
> cat dada.txt
Took: 15.473214149475098 seconds
Took: 12.94953465461731 seconds
Took: 2.235722780227661 seconds
Took: 40.53083419799805 seconds
Took: 21.840606212615967 seconds
Took: 35.777870893478394 seconds
Took: 13.153780221939087 seconds
Took: 2.966165781021118 seconds
Took: 35.54965615272522 seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
>
另一种快捷方式
$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file
20.053 13.4924
预期产量是多少;d提供用于计算想要输出的值的算法。
if($2)
您不想让零干扰结果吗;DTry with:echo-e1\\n0\\n2 | awk…
@RavinderSingh13我想我们应该这样做,这就是你生成可靠代码的方式。@RavinderSingh13 0,1,2的平均值是1,你的脚本给出1.5,因为它不处理零,因为如果($2)
,用如果($2!=“”)
或其他什么替换它。它输出一个,所以我给你一个;DBEGIN{a=0;q=0}
并不是严格必需的,因为在Awk中,数值变量自动初始化为0,但我喜欢尽可能地模仿已发布的算法。换句话说,一行的awk'{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}
与答案中的一行是等价的。最后一个链接断了。@karakfa更改了链接,非常感谢你的提示print m=1,m*2
-从未见过,这是一个好把戏,不是吗?作业有价值,这就是为什么您也可以编写a=b=1
。
$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
> cat dada.txt
Took: 15.473214149475098 seconds
Took: 12.94953465461731 seconds
Took: 2.235722780227661 seconds
Took: 40.53083419799805 seconds
Took: 21.840606212615967 seconds
Took: 35.777870893478394 seconds
Took: 13.153780221939087 seconds
Took: 2.966165781021118 seconds
Took: 35.54965615272522 seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
>
$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file
20.053 13.4924