Linux 按标题名称计算的平均列数

Linux 按标题名称计算的平均列数,linux,awk,average,multiple-columns,Linux,Awk,Average,Multiple Columns,我有这样列的文件。下面的示例输入是部分输入 请检查下面的主文件链接。每个文件只有两行 Gene 0.4% 0.7% 1.1% 1.4% 1.8% 2.2% 2.5% 2.9% 3.3% 3.6% 4.0% 4.3% 4.7% 5.1% 5.4% 5.8% 6.2% 6.5% 6.9% 7.2% 7.6% 8.0% 8.3% 8.7% 9.1%

我有这样列的文件。下面的示例输入是部分输入

请检查下面的主文件链接。每个文件只有两行

Gene    0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%  0.4%    0.7%    1.1%    1.4%    1.8%    2.2%    2.5%    2.9%    3.3%    3.6%    4.0%    4.3%    4.7%    5.1%    5.4%    5.8%    6.2%    6.5%    6.9%    7.2%    7.6%    8.0%    8.3%    8.7%    9.1%    9.4%    9.8%    10.1%   10.5%   10.9%   11.2%   11.6%   12.0%   12.3%   12.7%   13.0%   13.4%   13.8%   14.1%   14.5%   14.9%   15.2%   15.6%   15.9%   16.3%   16.7%   17.0%   17.4%   17.8%   18.1%   18.5%   18.8%   19.2%   19.6%   19.9%   20.3%   20.7%   21.0%   21.4%   21.7%   22.1%   22.5%   22.8%   23.2%   23.6%   23.9%   24.3%   24.6%   25.0%   25.4%   25.7%   26.1%   26.4%   26.8%   27.2%   27.5%   27.9%   28.3%   28.6%   29.0%   29.3%   29.7%   30.1%   30.4%   30.8%   31.2%   31.5%   31.9%   32.2%   32.6%   33.0%   33.3%   33.7%   34.1%   34.4%   34.8%   35.1%   35.5%   35.9%   36.2%   36.6%   37.0%   37.3%   37.7%   38.0%   38.4%   38.8%   39.1%   39.5%   39.9%   40.2%   40.6%   40.9%   41.3%   41.7%   42.0%   42.4%   42.8%   43.1%   43.5%   43.8%   44.2%   44.6%   44.9%   45.3%   45.7%   46.0%   46.4%   46.7%   47.1%   47.5%   47.8%   48.2%   48.6%   48.9%   49.3%   49.6%   50.0%   50.4%   50.7%   51.1%   51.4%   51.8%   52.2%   52.5%   52.9%   53.3%   53.6%   54.0%   54.3%   54.7%   55.1%   55.4%   55.8%   56.2%   56.5%   56.9%   57.2%   57.6%   58.0%   58.3%   58.7%   59.1%   59.4%   59.8%   60.1%   60.5%   60.9%   61.2%   61.6%   62.0%   62.3%   62.7%   63.0%   63.4%   63.8%   64.1%   64.5%   64.9%   65.2%   65.6%   65.9%   66.3%   66.7%   67.0%   67.4%   67.8%   68.1%   68.5%   68.8%   69.2%   69.6%   69.9%   70.3%   70.7%   71.0%   71.4%   71.7%   72.1%   72.5%   72.8%   73.2%   73.6%   73.9%   74.3%   74.6%   75.0%   75.4%   75.7%   76.1%   76.4%   76.8%   77.2%   77.5%   77.9%   78.3%   78.6%   79.0%   79.3%   79.7%   80.1%   80.4%   80.8%   81.2%   81.5%   81.9%   82.2%   82.6%   83.0%   83.3%   83.7%   84.1%   84.4%   84.8%   85.1%   85.5%   85.9%   86.2%   86.6%   87.0%   87.3%   87.7%   88.0%   88.4%   88.8%   89.1%   89.5%   89.9%   90.2%   90.6%   90.9%   91.3%   91.7%   92.0%   92.4%   92.8%   93.1%   93.5%   93.8%   94.2%   94.6%   94.9%   95.3%   95.7%   96.0%   96.4%   96.7%   97.1%   97.5%   97.8%   98.2%   98.6%   98.9%   99.3%   99.6%   100.0%
基本上,这是我需要做的

a。从第二列开始,此处为0.4%

b。继续,直到在标题名称中点击“10”。如果标题名正好是10.0%,那么也包括该列。如果不是,则仅包括它前面的列。在本例中,由于我们有10.1%(第29列),我们将包括从0.4%(第二列)开始到9.8%(第28列)的列。如果第29列是10.0%,那么它也会被包括在内

c。求第二行中各列的平均值(此处未显示数据-请单击此链接获取总数据集-)。在本例中,从0.4%(第二列)到9.8%(第28列)

d。在输出中,打印第一列“Gene”,列标题为的平均值为

Gene Average_10%
e。然后从10.1%(第29列)开始检查,直到在标题名称中点击“20”。重复步骤b到d。并将输出打印为

Gene Average_10% Average_20%
重复这一步,直到你完成

Gene Average_10% Average_20% Average_30% Average_40% Average_50% Average_60% Average_70% Average_80% Average_90% Average_100%
f。达到100%后,表示完成了一个数据集

g。如果您仔细观察我的列标题,在第一个100%之后还有0.4%-100%的列。我将有13个这些0.4%-100%s的输入文件在上面的链接

一,。我有多个文件,标题可以是

1% 2% 3%....100%
1.5% 2.5% 3.5%....100%

它因文件而异。但是平均的逻辑(如果你点击“10”、“20”等)总是一样的。样本数13也是相同的,这意味着每个文件将有13次100%s。

我应该说,对于这个任务来说,这是一种可怕的格式。我不希望任何人能为你提出最终的解决方案,但我会这样做

awk 'NR == 1 {
    gsub("%","");
    for (f=2; f<=NF; f++) {
      for (i=1; i<10; i++) 
          if ($f<10*i && $(f+1)>=10*i) print f, $f
      if ($f==100) print f, $f   
    }}' file

28 9.8
56 19.9
83 29.7
111 39.9
138 49.6
166 59.8
194 69.9
221 79.7
249 89.9
277 100.0
304 9.8
332 19.9
359 29.7
387 39.9
414 49.6
442 59.8
470 69.9
497 79.7
525 89.9
553 100.0
awk'NR==1{
gsub(“%”,”);

对于(f=2;f感谢您@karakfa