Select 用平均值和标准偏差（GNUPlot）打印选定行_Select_Row_Gnuplot_Average_Standard Deviation

Select 用平均值和标准偏差（GNUPlot）打印选定行

select gnuplot

Select 用平均值和标准偏差（GNUPlot）打印选定行,select,row,gnuplot,average,standard-deviation,Select,Row,Gnuplot,Average,Standard Deviation,我有一个带有实验结果的csv文件，如下所示： 64 4 8 1 1 2 1 ttt 62391 4055430 333 0.0001 10 161 108 288 0 64 4 8 1 1 2 1 ttt 60966 3962810 322 0.0001 10 164 112 295 0 64 4 8 1 1 2 1 ttt 61530 3999475 325 0.0001 10 162 112

我有一个带有实验结果的csv文件，如下所示：

64  4   8   1   1   2   1   ttt 62391   4055430 333 0.0001  10  161 108 288 0
64  4   8   1   1   2   1   ttt 60966   3962810 322 0.0001  10  164 112 295 0
64  4   8   1   1   2   1   ttt 61530   3999475 325 0.0001  10  162 112 291 0
64  4   8   1   1   2   1   ttt 61430   4054428 332 0.0001  10  158 110 286 0
64  4   8   1   1   2   1   ttt 63891   4152938 339 0.0001  9   149 109 274 0
64  4   32  1   1   2   1   ttt 63699   4204182 345 0.0001  4   43  179 240 0
64  4   32  1   1   2   1   ttt 63326   4116218 336 0.0001  4   45  183 248 0
64  4   32  1   1   2   1   ttt 62654   4135211 340 0.0001  4   48  178 248 0
64  4   32  1   1   2   1   ttt 63192   4107506 339 0.0001  4   49  175 245 0
64  4   32  1   1   2   1   ttt 62707   4138666 345 0.0001  4   46  179 245 0
64  4   64  1   1   2   1   ttt 60968   3962929 323 0.0001  4   46  191 256 0
64  4   64  1   1   2   1   ttt 58765   3819787 305 0.0001  4   50  196 267 0
64  4   64  1   1   2   1   ttt 58946   3831499 308 0.0001  5   52  187 260 0
64  4   64  1   1   2   1   ttt 60646   3942047 321 0.0001  4   47  187 254 0
64  4   64  1   1   2   1   ttt 59723   3882044 311 0.0001  4   46  201 269 0
64  8   8   1   1   2   1   ttt 63414   4185382 382 0.0001  33  517 109 643 0
64  8   8   1   1   2   1   ttt 62429   4057899 372 0.0001  33  538 110 667 0
64  8   8   1   1   2   1   ttt 60622   3940452 384 0.0001  33  556 115 689 0
64  8   8   1   1   2   1   ttt 64433   4188192 369 0.0001  33  519 110 644 0

  8  64  57773  0
  4  32  64721  2
  8  32  56757  1
  4  16  56226  2
  8   8  56055  1
  8  64  59874  0
  8  32  58733  0
  4  16  55525  2
  8  32  58869  0
  8  64  64470  0
  4  32  60930  1
  8  64  57073  2
  ...

  1  1  1  0  8  4  61639.4  2788.4
  1  1  2  0  8  8  59282.1  2740.2
  1  2  1  0 16  4  59372.3  2808.6
  1  2  2  0 16  8  60502.3  2825.0
  1  3  1  0 32  4  59850.7  2603.8
  1  3  2  0 32  8  60617.7  1979.8
  1  4  1  0 64  4  60399.4  3273.6
  1  4  2  0 64  8  59930.7  2919.8


  2  1  1  1  8  4  59172.6  2288.2
  2  1  2  1  8  8  58992.2  2888.0
  2  2  1  1 16  4  59350.1  2364.6
  2  2  2  1 16  8  61034.0  2368.5
  2  3  1  1 32  4  59920.8  2867.6
  2  3  2  1 32  8  59711.9  3464.2
  2  4  1  1 64  4  60936.7  3439.7
  2  4  2  1 64  8  61078.7  2349.3


  3  1  1  2  8  4  58976.0  2376.3
  3  1  2  2  8  8  61731.5  1635.7
  3  2  1  2 16  4  58276.0  2101.7
  3  2  2  2 16  8  58594.5  3358.5
  3  3  1  2 32  4  60471.5  3737.6
  3  3  2  2 32  8  59909.1  2024.0
  3  4  1  2 64  4  62044.2  1446.7
  3  4  2  2 64  8  60454.0  3215.1

我的目标是能够绘制“ttt”之前各列的各种组合（在不同的图表中选择其中一个），以及“ttt”之后各列的平均和标准偏差（选择其中一个）（通过按“ttt”之前各列进行分组）

这在GNUPlot中可能吗？如果可能，如何实现？如果没有，您对我的问题有其他建议吗？

我认为gnuplot无法在一个plot命令中生成您所要求的内容。我将向您展示两个备选方案，希望其中一个或两个都是有用的起点

备选方案1：标准箱线图

spacing = 1.0
width = 0.25
unset key
set xlabel "Column 3"
set ylabel "Column 9"
plot 'data' using (spacing):9:(width):3 with boxplot lw 2

这将根据第3列中的值收集点，并为每个此类值生成一个箱线图。这是一种广泛使用的显示类别中点值分布的方法，但它是四分位分析，而不是平均值+标准偏差的显示

备选方案2：计算预先已知类别的平均值和标准偏差

箱线图分析的优点是，您无需事先知道第3列中可能存在哪些值。Gnuplot可以根据第3列的值计算平均值和标准偏差，但需要事先指定该值。下面是一组根据您提供的特定示例文件定制的命令。它计算但不绘制要求的分类平均值和标准偏差。您可以使用这些数字来构建绘图，但这需要其他命令。例如，您可以将每个类别的值保存在新文件、数组或数据块中，然后返回并一起打印

col3entry = "8 32 64"
do for [i in col3entry] {
    stats "data" using ($3 == real(i) ? $9 : NaN) name "Condition".i nooutput
    print  i, ": ", value("Condition".i."_mean"), value("Condition".i."_stddev")

}

输出：

8: 62345.1111111111 1259.34784220021
32: 63115.6 392.552977316438
64: 59809.6 881.583711283279

这是一个完全修改过的更一般的版本

由于要按3列过滤，因此需要有3个属性来区分绘图中的数据。例如，颜色、x位置和点类型。脚本的基本功能是：

生成用于测试的随机数据（取而代之的是您的文件）

$Data

如下所示：

64  4   8   1   1   2   1   ttt 62391   4055430 333 0.0001  10  161 108 288 0
64  4   8   1   1   2   1   ttt 60966   3962810 322 0.0001  10  164 112 295 0
64  4   8   1   1   2   1   ttt 61530   3999475 325 0.0001  10  162 112 291 0
64  4   8   1   1   2   1   ttt 61430   4054428 332 0.0001  10  158 110 286 0
64  4   8   1   1   2   1   ttt 63891   4152938 339 0.0001  9   149 109 274 0
64  4   32  1   1   2   1   ttt 63699   4204182 345 0.0001  4   43  179 240 0
64  4   32  1   1   2   1   ttt 63326   4116218 336 0.0001  4   45  183 248 0
64  4   32  1   1   2   1   ttt 62654   4135211 340 0.0001  4   48  178 248 0
64  4   32  1   1   2   1   ttt 63192   4107506 339 0.0001  4   49  175 245 0
64  4   32  1   1   2   1   ttt 62707   4138666 345 0.0001  4   46  179 245 0
64  4   64  1   1   2   1   ttt 60968   3962929 323 0.0001  4   46  191 256 0
64  4   64  1   1   2   1   ttt 58765   3819787 305 0.0001  4   50  196 267 0
64  4   64  1   1   2   1   ttt 58946   3831499 308 0.0001  5   52  187 260 0
64  4   64  1   1   2   1   ttt 60646   3942047 321 0.0001  4   47  187 254 0
64  4   64  1   1   2   1   ttt 59723   3882044 311 0.0001  4   46  201 269 0
64  8   8   1   1   2   1   ttt 63414   4185382 382 0.0001  33  517 109 643 0
64  8   8   1   1   2   1   ttt 62429   4057899 372 0.0001  33  538 110 667 0
64  8   8   1   1   2   1   ttt 60622   3940452 384 0.0001  33  556 115 689 0
64  8   8   1   1   2   1   ttt 64433   4188192 369 0.0001  33  519 110 644 0

  8  64  57773  0
  4  32  64721  2
  8  32  56757  1
  4  16  56226  2
  8   8  56055  1
  8  64  59874  0
  8  32  58733  0
  4  16  55525  2
  8  32  58869  0
  8  64  64470  0
  4  32  60930  1
  8  64  57073  2
  ...

  1  1  1  0  8  4  61639.4  2788.4
  1  1  2  0  8  8  59282.1  2740.2
  1  2  1  0 16  4  59372.3  2808.6
  1  2  2  0 16  8  60502.3  2825.0
  1  3  1  0 32  4  59850.7  2603.8
  1  3  2  0 32  8  60617.7  1979.8
  1  4  1  0 64  4  60399.4  3273.6
  1  4  2  0 64  8  59930.7  2919.8


  2  1  1  1  8  4  59172.6  2288.2
  2  1  2  1  8  8  58992.2  2888.0
  2  2  1  1 16  4  59350.1  2364.6
  2  2  2  1 16  8  61034.0  2368.5
  2  3  1  1 32  4  59920.8  2867.6
  2  3  2  1 32  8  59711.9  3464.2
  2  4  1  1 64  4  60936.7  3439.7
  2  4  2  1 64  8  61078.7  2349.3


  3  1  1  2  8  4  58976.0  2376.3
  3  1  2  2  8  8  61731.5  1635.7
  3  2  1  2 16  4  58276.0  2101.7
  3  2  2  2 16  8  58594.5  3358.5
  3  3  1  2 32  4  60471.5  3737.6
  3  3  2  2 32  8  59909.1  2024.0
  3  4  1  2 64  4  62044.2  1446.7
  3  4  2  2 64  8  60454.0  3215.1

变量

ColX

、

ColC

、

ColP

和

ColS

定义了x位置、颜色、点类型和统计信息的列

找到

ColX

，

ColC

，

ColP

，（检查

help slooth frequency

）的唯一值，并将它们放入数据块

$ColX

，

$ColC

，和

$ColP

将唯一值放入数组

ArrX

，

ArrC

，

ArrP

循环所有可能的组合，对

ColS

进行统计，并将其放入

$Data2

。在“颜色”、“x位置”和“点类型”的开头添加3列

$Data2

如下所示：

64  4   8   1   1   2   1   ttt 62391   4055430 333 0.0001  10  161 108 288 0
64  4   8   1   1   2   1   ttt 60966   3962810 322 0.0001  10  164 112 295 0
64  4   8   1   1   2   1   ttt 61530   3999475 325 0.0001  10  162 112 291 0
64  4   8   1   1   2   1   ttt 61430   4054428 332 0.0001  10  158 110 286 0
64  4   8   1   1   2   1   ttt 63891   4152938 339 0.0001  9   149 109 274 0
64  4   32  1   1   2   1   ttt 63699   4204182 345 0.0001  4   43  179 240 0
64  4   32  1   1   2   1   ttt 63326   4116218 336 0.0001  4   45  183 248 0
64  4   32  1   1   2   1   ttt 62654   4135211 340 0.0001  4   48  178 248 0
64  4   32  1   1   2   1   ttt 63192   4107506 339 0.0001  4   49  175 245 0
64  4   32  1   1   2   1   ttt 62707   4138666 345 0.0001  4   46  179 245 0
64  4   64  1   1   2   1   ttt 60968   3962929 323 0.0001  4   46  191 256 0
64  4   64  1   1   2   1   ttt 58765   3819787 305 0.0001  4   50  196 267 0
64  4   64  1   1   2   1   ttt 58946   3831499 308 0.0001  5   52  187 260 0
64  4   64  1   1   2   1   ttt 60646   3942047 321 0.0001  4   47  187 254 0
64  4   64  1   1   2   1   ttt 59723   3882044 311 0.0001  4   46  201 269 0
64  8   8   1   1   2   1   ttt 63414   4185382 382 0.0001  33  517 109 643 0
64  8   8   1   1   2   1   ttt 62429   4057899 372 0.0001  33  538 110 667 0
64  8   8   1   1   2   1   ttt 60622   3940452 384 0.0001  33  556 115 689 0
64  8   8   1   1   2   1   ttt 64433   4188192 369 0.0001  33  519 110 644 0

  8  64  57773  0
  4  32  64721  2
  8  32  56757  1
  4  16  56226  2
  8   8  56055  1
  8  64  59874  0
  8  32  58733  0
  4  16  55525  2
  8  32  58869  0
  8  64  64470  0
  4  32  60930  1
  8  64  57073  2
  ...

  1  1  1  0  8  4  61639.4  2788.4
  1  1  2  0  8  8  59282.1  2740.2
  1  2  1  0 16  4  59372.3  2808.6
  1  2  2  0 16  8  60502.3  2825.0
  1  3  1  0 32  4  59850.7  2603.8
  1  3  2  0 32  8  60617.7  1979.8
  1  4  1  0 64  4  60399.4  3273.6
  1  4  2  0 64  8  59930.7  2919.8


  2  1  1  1  8  4  59172.6  2288.2
  2  1  2  1  8  8  58992.2  2888.0
  2  2  1  1 16  4  59350.1  2364.6
  2  2  2  1 16  8  61034.0  2368.5
  2  3  1  1 32  4  59920.8  2867.6
  2  3  2  1 32  8  59711.9  3464.2
  2  4  1  1 64  4  60936.7  3439.7
  2  4  2  1 64  8  61078.7  2349.3


  3  1  1  2  8  4  58976.0  2376.3
  3  1  2  2  8  8  61731.5  1635.7
  3  2  1  2 16  4  58276.0  2101.7
  3  2  2  2 16  8  58594.5  3358.5
  3  3  1  2 32  4  60471.5  3737.6
  3  3  2  2 32  8  59909.1  2024.0
  3  4  1  2 64  4  62044.2  1446.7
  3  4  2  2 64  8  60454.0  3215.1

最后，绘制数据。我无法弄清楚yerror的打印样式

如何与变量点类型一起正常工作。因此，我将其拆分为两个绘图命令和向量以及和点。第三个keyentry
只是在图例中获取一个空行，第四个是在图例中获取pointtype


我希望您能找出所有其他细节，并根据您的数据进行调整
代码：
### grouped statistics on filtered (unsorted) data 
reset session
set colorsequence classic

# generate some random test data
rand1(n) = 2**(int(rand(0)*2)+2)    # values 4,8
rand2(n) = 2**(int(rand(0)*4)+3)    # values 8,16,32,64
rand3(n) = int(rand(0)*10000)+55000 # values 55000 to 65000
rand4(n) = int(rand(0)*3)           # values 0,1,2
set print $Data
do for [i=1:200] {
    print sprintf("% 3d% 4d% 7d% 3d", rand1(0), rand2(0), rand3(0), rand4(0))
}
set print
print $Data    # (just for test purpose)

ColX = 2   # column for x
ColC = 4   # column for color
ColP = 1   # column for pointtype
ColS = 3   # column for statistics

# get unique values of the columns
set table $ColX
    plot $Data u (column(ColX)) smooth freq
unset table

set table $ColC
    plot $Data u (column(ColC)) smooth freq
unset table

set table $ColP
    plot $Data u (column(ColP)) smooth freq
unset table

# put unique values into arrays
set table $Dummy
    array ArrX[|$ColX|-6]   # gnuplot creates 6 extra lines
    array ArrC[|$ColC|-6]
    array ArrP[|$ColP|-6]
    plot $ColX u (ArrX[$0+1]=$1)
    plot $ColC u (ArrC[$0+1]=$1)
    plot $ColP u (ArrP[$0+1]=$1)
unset table
print ArrX, ArrC, ArrP    # just for test purpose

# define filter function
Filter(c,x,p) = ArrX[x]==column(ColX) && ArrC[c]==column(ColC) && \
               ArrP[p]==column(ColP) ? column(ColS) : NaN

# loop all values and do statistics, write data into $Data2
set print $Data2
    do for [c=1:|ArrC|] {
        do for [x=1:|ArrX|] {
            do for [p=1:|ArrP|] {
                undef var STATS*
                stats $Data u (Filter(c,x,p)) nooutput
                if (exists('STATS_mean') && exists('STATS_stddev')) {
                    print sprintf("% 3d% 3d% 3d% 3d% 3d% 3d% 9.1f % 7.1f", c, x, p, ArrC[c], ArrX[x], ArrP[p], STATS_mean, STATS_stddev)
                }
            }
        }
    print "";  print ""
    }  
set print
# print $Data2  # just for testing purpose

set xlabel sprintf("Column %d", ColX)
set ylabel sprintf("Column %d", ColS)
set xrange[0.5:|ArrX|+1]
set xtics ()    # remove all xtics
do for [x=1:|ArrX|] { set xtics add (sprintf("%d",ArrX[x]) x)}   # set xtics "manually"

# function for x position and offsets, 
# actually not dependent on 'n' but to shorten plot command
# columns in $Data2: 1=color, 2=x, 3=pointtype
width = 0.5   # float number!
step = width/(|ArrC|-1)
PosX(n) = column(2) - width/2.0 + step*(column(1)-1) + (column(3)-1)*step*0.3

plot \
     for [c=1:|ArrC|] $Data2 u (PosX(0)):($7-$8):(0):(2*$8) index c-1 w vectors \
     heads size 0.04,90 lw 2 lc c ti sprintf("%g",ArrC[c]),\
     for [c=1:|ArrC|] '' u (PosX(0)):7:($3*2+4):(c) index c-1 w p ps 1.5 pt var lc var not, \
     keyentry w p ps 0 ti "\n", \
     for [p=1:|ArrP|] '' u (0):(NaN) w p pt p*2+4 ps 1.5 lc rgb "black" ti sprintf("%g",ArrP[p])

### end of code

结果：
### grouped statistics on filtered (unsorted) data 
reset session
set colorsequence classic

# generate some random test data
rand1(n) = 2**(int(rand(0)*2)+2)    # values 4,8
rand2(n) = 2**(int(rand(0)*4)+3)    # values 8,16,32,64
rand3(n) = int(rand(0)*10000)+55000 # values 55000 to 65000
rand4(n) = int(rand(0)*3)           # values 0,1,2
set print $Data
do for [i=1:200] {
    print sprintf("% 3d% 4d% 7d% 3d", rand1(0), rand2(0), rand3(0), rand4(0))
}
set print
print $Data    # (just for test purpose)

ColX = 2   # column for x
ColC = 4   # column for color
ColP = 1   # column for pointtype
ColS = 3   # column for statistics

# get unique values of the columns
set table $ColX
    plot $Data u (column(ColX)) smooth freq
unset table

set table $ColC
    plot $Data u (column(ColC)) smooth freq
unset table

set table $ColP
    plot $Data u (column(ColP)) smooth freq
unset table

# put unique values into arrays
set table $Dummy
    array ArrX[|$ColX|-6]   # gnuplot creates 6 extra lines
    array ArrC[|$ColC|-6]
    array ArrP[|$ColP|-6]
    plot $ColX u (ArrX[$0+1]=$1)
    plot $ColC u (ArrC[$0+1]=$1)
    plot $ColP u (ArrP[$0+1]=$1)
unset table
print ArrX, ArrC, ArrP    # just for test purpose

# define filter function
Filter(c,x,p) = ArrX[x]==column(ColX) && ArrC[c]==column(ColC) && \
               ArrP[p]==column(ColP) ? column(ColS) : NaN

# loop all values and do statistics, write data into $Data2
set print $Data2
    do for [c=1:|ArrC|] {
        do for [x=1:|ArrX|] {
            do for [p=1:|ArrP|] {
                undef var STATS*
                stats $Data u (Filter(c,x,p)) nooutput
                if (exists('STATS_mean') && exists('STATS_stddev')) {
                    print sprintf("% 3d% 3d% 3d% 3d% 3d% 3d% 9.1f % 7.1f", c, x, p, ArrC[c], ArrX[x], ArrP[p], STATS_mean, STATS_stddev)
                }
            }
        }
    print "";  print ""
    }  
set print
# print $Data2  # just for testing purpose

set xlabel sprintf("Column %d", ColX)
set ylabel sprintf("Column %d", ColS)
set xrange[0.5:|ArrX|+1]
set xtics ()    # remove all xtics
do for [x=1:|ArrX|] { set xtics add (sprintf("%d",ArrX[x]) x)}   # set xtics "manually"

# function for x position and offsets, 
# actually not dependent on 'n' but to shorten plot command
# columns in $Data2: 1=color, 2=x, 3=pointtype
width = 0.5   # float number!
step = width/(|ArrC|-1)
PosX(n) = column(2) - width/2.0 + step*(column(1)-1) + (column(3)-1)*step*0.3

plot \
     for [c=1:|ArrC|] $Data2 u (PosX(0)):($7-$8):(0):(2*$8) index c-1 w vectors \
     heads size 0.04,90 lw 2 lc c ti sprintf("%g",ArrC[c]),\
     for [c=1:|ArrC|] '' u (PosX(0)):7:($3*2+4):(c) index c-1 w p ps 1.5 pt var lc var not, \
     keyentry w p ps 0 ti "\n", \
     for [p=1:|ArrP|] '' u (0):(NaN) w p pt p*2+4 ps 1.5 lc rgb "black" ti sprintf("%g",ArrP[p])

### end of code

请澄清：您的意思是，例如，根据第1-7列的某些组合选择第N行，然后绘制同一行中第14、15、16列的平均值？或者你的意思是用第2列==8来绘制第13列在所有列上的平均值？还是别的？第二个。类似于SELECT$3，AVERAGE（$9），STD_DEV（$9），其中$2=4，groupby$3，是一种（非常）松散的sql格式。我仍然不清楚要绘制什么。x坐标从哪里来？您希望在该x处绘制哪个y坐标？x取自上述示例中的before列$3。y是上面示例中按after列、average（$9）、STD_DEV（$9）（作为错误）分组的平均值和标准偏差。这非常有效。非常感谢。还有一个问题。您知道如何删除标签表中每个值后的“制表符”，以便图例中的值和符号之间不会出现较大的空白吗？在plot命令之前，您可以添加行set key Left
，使其更接近。注意：它是Left
而不是Left
。请参阅帮助键。这不起作用。标签页会在字母后面创建一个大的空白，所以即使我反转符号和字母，框仍然会比需要的大得多。唯一的解决办法是从值中删除选项卡。好的，我明白了。。。一个快速的解决方法是将$Label[i+1]
替换为word（$Label[i+1]，1）
。我在代码中修改了它。我正在尝试编辑你的脚本，但没有任何运气。。。我希望能够添加第四个参数，比如说最后一列（这里只有0，假设它也可以取1和2），根据该参数，统计数据的列被求和，例如对同一列的ColS求和，ColF和colu_NEW（并相应地计算错误）。这可能吗？