Bash/Nawk空白问题_Bash_Shell_Awk_Runtime Error_Random Walk

Bash/Nawk空白问题

bash shell awk

Bash/Nawk空白问题,bash,shell,awk,runtime-error,random-walk,Bash,Shell,Awk,Runtime Error,Random Walk,我有100个数据文件，每个有1000行，它们看起来都像这样： 0 0 0 0 1 0 1 0 2 0 1 -1 3 0 1 -2 4 1 1 -2 5 1 1 -3 6 1 0 -3 7 2 0 -3 8 2 0 -4 9 3 0 -4 10 4 0 -4 . . . 999

我有100个数据文件，每个有1000行，它们看起来都像这样：

0       0   0   0
1       0   1   0
2       0   1   -1
3       0   1   -2
4       1   1   -2
5       1   1   -3
6       1   0   -3
7       2   0   -3
8       2   0   -4
9       3   0   -4
10      4   0   -4
.
.
.
999     1   47  -21
1000        2   47  -21

我开发了一个脚本，该脚本应该取第2、3、4列中每个值的平方，然后求和并求平方根。像这样：

然后计算该值的平方，并对每个数据文件中的这些数字进行平均，以输出每行的平均“calc”和每行的平均“fluc”

这些数字的含义如下：第一个数字是步数，接下来三个分别是x、y和z轴上的坐标。我试图找到“台阶”距离原点的距离，这是用公式

r=sqrt（x^2+y^2+z^2）

计算的。接下来我需要r的波动，它的计算公式是

f=r^4

或

f=（r^2）^2

。这些必须是100个数据文件的平均值，这导致我：

r = r + sqrt(x^2 + y^2 + z^2)
avg = r/s

类似地，对于f，s是读取数据文件的数量，我使用

sum=$（ls-l*.data | wc-l）

计算得出。最后，我的最后一次计算是预期的

和平均值

之间的偏差，使用最终值计算为

stddev=sqrt（fluc-（r^2）^2）

我创建的脚本是：

#!/bin/bash

sum=$(ls -l *.data | wc -l)
paste -d"\t" *.data | nawk -v s="$sum" '{
    for(i=0;i<=s-1;i++)
    {
        t1 = 2+(i*4)
        t2 = 3+(i*4)
        t3 = 4+(i*4)
        temp = ($t1*$t1) + ($t2*$t2) + ($t3*$t3)
        calc = $calc + sqrt ($temp)
        fluc = $fluc + ($calc*$calc)
    }
    stddev = sqrt(($calc^2) - ($fluc))
    print $1" "calc/s" "fluc/s" "stddev
    temp=0
    calc=0
    stddev=0
}'

我对awk经验不足，无法准确地找出我的错误所在，有人能给我指出正确的方向或给我一个更好的脚本吗

预期输出是一个包含以下内容的文件：

0 0 0 0
1 (calc for all 1's) (fluc for all 1's) (stddev for all 1's)
2 (calc for all 2's) (fluc for all 2's) (stddev for all 2's)
.
.
.

下面的脚本应该执行您想要的操作。唯一可能还不起作用的是分隔符的选择。在原始脚本中，似乎有选项卡。我的解决方案假设空间。但改变这一点应该不是问题

它只是将所有文件按顺序导入

nawk

，而不首先计算文件数。我知道这不是必需的。它使用数组存储每个步骤的独立统计数据，而不是试图跟踪文件中的位置。最后，它迭代找到的所有步骤索引并输出它们。由于迭代没有排序，因此Unix

sort

调用中还有另一个管道来处理这个问题

#!/bin/bash
# pipe the data of all files into the nawk processor
cat *.data | nawk ' 
BEGIN { 
  FS=" "                         # set the delimiter for the columns
} 
{
  step = $1                      # step is in column 1
  temp = $2*$2 + $3*$3 + $4*$4

  # use arrays indexed by step to store data
  calc[step] = calc[step] + sqrt (temp)
  fluc[step] = fluc[step] + calc[step]*calc[step]
  count[step] = count[step] + 1   # count the number of samples seen for a step
}
END {
  # iterate over all existing steps (this is not sorted!)
  for (i in count) {
    stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
    print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
  }
}' | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"

编辑

正如@edmorton

awk

所建议的，awk可以自行加载文件。下面的增强版本删除了对

cat

的调用，而是将文件模式作为参数传递给

nawk

。此外，正如@NictraSavios所建议的，新版本为最后一步的统计数据输出引入了特殊处理。请注意，所有步骤的统计数据收集仍在进行中。在读取数据的过程中要抑制这一点有点困难，因为此时我们还不知道最后一步是什么。虽然这可以通过一些额外的努力来完成，但您可能会失去很多数据处理的健壮性，因为现在脚本没有对以下内容进行任何假设：

提供的文件数量
处理文件的顺序
每个文件中的步骤数
文件中步骤的顺序
步骤的完整性是一个没有“孔”的范围

增强脚本：

#!/bin/bash
nawk ' 
BEGIN { 
  FS=" "   # set the delimiter for the columns (not really required for space which is the default)
  maxstep = -1
} 
{
  step = $1                      # step is in column 1
  temp = $2*$2 + $3*$3 + $4*$4

  # remember maximum step for selected output
  if (step > maxstep)
    maxstep = step

  # use arrays indexed by step to store data
  calc[step] = calc[step] + sqrt (temp)
  fluc[step] = fluc[step] + calc[step]*calc[step]
  count[step] = count[step] + 1   # count the number of samples seen for a step
}
END {
  # iterate over all existing steps (this is not sorted!)
  for (i in count) {
    stddev = sqrt((calc[i] * calc[i]) + (fluc[i] * fluc[i]))
    if (i == maxstep)
      # handle the last step in a special way
      print i" "calc[i]/count[i]" "fluc[i]/count[i]" "stddev
    else
      # this is the normal handling
      print i" "calc[i]/count[i]
  }
}' *.data | sort -n -k 1 # that' why we sort here: first column "-k 1" and numerically "-n"