使用awk计算列的平均值和标准差

使用awk计算列的平均值和标准差,awk,mean,standard-deviation,Awk,Mean,Standard Deviation,我有这个档案: Took: 15.473214149475098 seconds Took: 12.94953465461731 seconds Took: 2.235722780227661 seconds Took: 40.53083419799805 seconds Took: 21.840606212615967 seconds Took: 35.777870893478394 seconds Took: 13.153780221939087 seconds

我有这个档案:

Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds
我想直接在终端中计算时间的平均值和标准差。能
awk
帮助吗?我对它不太熟悉。我尝试拆分文件,以仅通过以下方式获得具有数值的列:
cat | awk-F”接受:{print$2}
但它只返回了文件的全部内容。

请尝试下面的方法获取第二列的平均值

awk '{sum+=$2;if($2){count++}} END{print sum/count}'  Input_file
编辑:

关于标准偏差的讨论有一个有趣的部分。特别令人感兴趣的是简单且数值稳定的:

其中,在每一步,
A_k
等于运行平均值,
Q_k
通过关系
Q_k=σ²*k
与总体方差σ²相关

有了这样的理论背景,我们就可以写作了

$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
使用Perl一行程序

> cat dada.txt 
Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
> 
另一种快捷方式

$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file

20.053 13.4924

预期产量是多少;d提供用于计算想要输出的值的算法。
if($2)
您不想让零干扰结果吗;DTry with:
echo-e1\\n0\\n2 | awk…
@RavinderSingh13我想我们应该这样做,这就是你生成可靠代码的方式。@RavinderSingh13 0,1,2的平均值是1,你的脚本给出1.5,因为它不处理零,因为
如果($2)
,用
如果($2!=“”)
或其他什么替换它。它输出一个,所以我给你一个;D
BEGIN{a=0;q=0}
并不是严格必需的,因为在Awk中,数值变量自动初始化为0,但我喜欢尽可能地模仿已发布的算法。换句话说,一行的
awk'{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}
与答案中的一行是等价的。最后一个链接断了。@karakfa更改了链接,非常感谢你的提示
print m=1,m*2
-从未见过,这是一个好把戏,不是吗?作业有价值,这就是为什么您也可以编写
a=b=1
$ awk 'BEGIN{a=0;q=0}{x=$2;b=a+(x-a)/NR;q+=(x-a)*(x-b);a=b}END{print a,sqrt(q/NR)}' file
> cat dada.txt 
Took:  15.473214149475098  seconds
Took:  12.94953465461731  seconds
Took:  2.235722780227661  seconds
Took:  40.53083419799805  seconds
Took:  21.840606212615967  seconds
Took:  35.777870893478394  seconds
Took:  13.153780221939087  seconds
Took:  2.966165781021118  seconds
Took:  35.54965615272522  seconds
> perl -lane '$s+=$F[1];push(@a,$F[1]); END { $m=$s/@a; $sd+=($_-$m)**2 for(@a);$sd=sqrt($sd/@a); print "Mean:$m\nStandard Deviation:$sd"} ' dada.txt
Mean:20.0530427826775
Standard Deviation:13.4923983082523
> 
$ awk '{s+=$2; ss+=$2^2} END{print m=s/NR, sqrt(ss/NR-m^2)}' file

20.053 13.4924