Shell 来自不同数据文件的事件直方图

Shell 来自不同数据文件的事件直方图,shell,awk,gnuplot,Shell,Awk,Gnuplot,我的程序模拟的结果是几个数据文件,第一列表示成功(=0)或错误(=1),第二列表示模拟时间(秒) 这两列的一个例子是: 1 185.48736852299064 1 199.44533672989186 1 207.35654106612733 1 213.5214031236177 1 215.50576147950017 0 219.62444310777695 0 222.26750248416354 0 236.1402270910635 1 238.5124609287994 0

我的程序模拟的结果是几个数据文件,第一列表示成功(
=0
)或错误(
=1
),第二列表示模拟时间(秒)

这两列的一个例子是:

1 185.48736852299064
1 199.44533672989186
1 207.35654106612733
1 213.5214031236177 
1 215.50576147950017
0 219.62444310777695
0 222.26750248416354
0 236.1402270910635 
1 238.5124609287994 
0 246.4538392581228 
.   .
.   .
.   .
1 307.482605596962
1 329.16494123373445
0 329.6454558227778 
1 330.52804695995303
0 332.0673690346546 
0 358.3001385706268 
0 359.82271742496414
1 400.8162129871805 
0 404.88783391725985
1 411.27012219170393
我可以将数据合并成错误(
1
)的频率图(直方图)

set encoding iso_8859_1
set key left top 
set ylabel "P_{error}" 
set xlabel "Time [s]" 
set size 1.4, 1.2
set terminal postscript eps enhanced color "Helvetica" 16 
set grid ytics
set key spacing 1.5
set style fill transparent solid 0.3

`grep '^ 1' lookup-ratio-50-0.0034-50-7-20-10-3-1.txt | awk '{print $2}' > t7.dat`

stats 't7.dat' u 1
set output "t7.eps"
binwidth=2000
bin(x,width)=width*floor(x/width)
plot 't7.dat' using (bin($1,binwidth)):(1.0/STATS_records) smooth freq with boxes lc rgb "midnight-blue" title "7x7_P_error"
结果

我想将上面的Gnuplot改进为并包括其余的数据文件
lookup-..-.txt
及其错误样本,并将它们加入到相同的频率图中

我还希望避免使用中间文件,如
t7.dat

此外,我想画一条误差概率平均值的水平线

如何在同一绘图中绘制所有样本数据


关于

您可以通过管道将数据和绘图指令传输到gnuplot,而无需临时文件

比如说

$ awk 'BEGIN{print "plot \"-\" using ($1):($2)"; 
             while(i++<20) print i,rand()*20; print "e"}' | gnuplot -p

如果我没弄错的话,你需要在几个文件上做柱状图。因此,基本上必须连接多个数据文件。 当然,您可以使用一些外部程序(如awk等)或shell命令来实现这一点。 下面是gnuplot和系统命令的可能解决方案,不需要临时文件。system命令适用于Windows,但您可能可以轻松地将其转换为Linux。也许您需要检查“NaN”值是否不会弄乱您的装箱和直方图结果

### start code
reset session
# create some dummy data files
do for [i=1:5] {
    set table sprintf("lookup-blahblah_%d.txt", i)
    set samples 50
    plot '+' u (int(rand(0)+0.5)):(rand(0)*0.9+0.1) w table
    unset table
}
# end creating dummy data files

FILELIST = system("dir /B lookup*.txt")   # this is for Windows
print FILELIST

undefine $AllDataWithError
set table $AllDataWithError append
do for [i=1:words(FILELIST)] {
    plot word(FILELIST,i) u ($1==1? $1 : NaN):($1==1? $2 : NaN) w table
}
unset table

print $AllDataWithError

# ... do your binning and plotting
### end of code
编辑:

显然,
NaN
和/或空行似乎把
slooth freq
和/或binning搞砸了?! 因此,我们只需要提取有错误(=1)的行。 通过上面的代码,您可以将多个文件合并到一个数据块中。 下面的代码已经以一个类似于您的数据的数据块开始

### start of code
reset session

# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
    set samples 1000
    plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data

stats $Data nooutput
Datapoints = STATS_records

# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique 
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
    plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
   plot $Dummy u 1:2 with table
unset table

bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records

set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
    Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
    $Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\

unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"

binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"

unset multiplot
### end of code
结果是:

简单改进;删除
grep
并使用
awk'$1==1{print$2}'查找-*.txt
将从匹配的文件名中提取。我认为您可以将输出直接传输到gnuplot,但是您在多个地方使用它。@karakfa谢谢。它将所有第二列合并为一列。请问如何将
awk
的输出传递到
stats
plot
命令?。这样我就不需要使用中间文件了。不能在这里发布很长的代码,请看回复谢谢。我尝试过这个awk代码,当我使用管道
gnuplot-p
时,它给出了一个错误。我也不知道如何得到所有箱子高度的平均误差概率。如果0.12是平均频率值,但我不知道是哪个值,则可以用
设置箭头从图0,第一个0.12到图1,第一个0.12 nohead lc rgb“#000000”front
绘制一条水平线。您必须填写
打印中的指令
可能还缺少了我刚才添加的最后一个
e
。我知道您建议使用外壳绘制图,但我需要为我的图定制其余的Gnuplot。谢谢。这就是我打算做的。我还想绘制平均错误,这样我就应该计算复合文件的错误,然后除以数据记录的计数(错误和成功示例)。你知道如何计算它并将其添加到直方图图中吗?同样在我的图中,为了进行分类,我使用了所有错误样本的计数(
STAT\u records
variable)。我假设我可以在
stats
命令中使用
$AllDataWithError
。您的代码可以与stats命令配合使用。我提到的误差平均值及其在直方图中的图的问题是,它似乎随直方图的宽度而变化。我必须计算每个箱子内样本的错误概率并计算平均值,但我不知道如何进行此计算。实际上,您可以使用
stats$AllDataWithError
,然后
stats\u records
是错误数(=1)entries和
STATS\u invalid
是成功(=0,或现在为NaN)条目的数量。如果您有一些注释行或空行,可能会有所不同,我没有检查。很抱歉,在执行binning和plot命令时,
binwidth=1000
bin(x,width)=width*floor(x/width)
使用(bin($1,binwidth)):(1.0/STATS\u记录)使用方框平滑频率
它在最终打印命令中给出错误。我可以在最后一个绘图中使用
$AllDatawithError
?。
### start of code
reset session

# create some dummy datablock with some distribution (with no negative values)
Height =3000
Pos = 6000
set table $Data
    set samples 1000
    plot '+' u (int(rand(0)+0.3)):(abs(invnorm(rand(0))*Height+Pos)) w table
unset table
# end creating dummy data

stats $Data nooutput
Datapoints = STATS_records

# get only the error lines
# plot $Data into the table $Dummy.
# If $1==1 (=Error) write the line number $0 into column 1 and value into column 2
# else write NaN into column 1 and column 2.
# Since $0 is the line number which is unique 
# 'smooth frequency' will keep these lines "as is"
# but change the NaN lines to empty lines.
Error = 1
Success = 0
set table $Dummy
    plot $Data u ($1==Error ? $0 : NaN):($1==Error ? $2 : NaN) smooth freq
unset table
# get rid of empty lines in $Dummy
# Since empty lines seem to also mess up binning you need to remove them
# by writing $Dummy into the dataset $Error via "plot ... with table".
set table $Error
   plot $Dummy u 1:2 with table
unset table

bin(x) = binwidth*floor(x/binwidth)
stats $Error nooutput
ErrorCount = STATS_records

set multiplot layout 3,1
set key outside
set label 1 sprintf("Datapoints: %g\nSuccess: %g\nError: %g",\
    Datapoints, Datapoints-ErrorCount,ErrorCount) at graph 1.02, first 0
plot $Data u 0:($1 == Success ? $2 : NaN) w impulses lc rgb "web-green" t "Success",\
    $Data u 0:($1 == Error ? -$2 : NaN) w impulses lc rgb "red" t "Error",\

unset label 1
set key inside
binwidth = 1000
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "blue"

binwidth=100
set xrange[GPVAL_X_MIN:GPVAL_X_MAX] # use same xrange as graph before
plot $Error using (bin($2)):(1.0/STATS_records) smooth freq with boxes t sprintf("binwidth: %d",binwidth) lc rgb "magenta"

unset multiplot
### end of code