规范化gnuplot中的直方图容器_Gnuplot_Histogram_Normalize_Bins

规范化gnuplot中的直方图容器

gnuplot

规范化gnuplot中的直方图容器,gnuplot,histogram,normalize,bins,Gnuplot,Histogram,Normalize,Bins,我试图绘制一个直方图，它的存储箱是由存储箱中元素的数量来规范化的我正在使用下面的 binwidth=5 bin(x,width)=width*floor(x/width) + binwidth/2.0 plot 'file' using (bin($2, binwidth)):($4) smooth freq with boxes 得到一个基本的柱状图，但我希望每个箱子的值除以箱子的大小。我如何在gnuplot中实现这一点，或者在必要时使用外部工具？下面是我的方法，使用以下命令从R生成n=5

我试图绘制一个直方图，它的存储箱是由存储箱中元素的数量来规范化的

我正在使用下面的

binwidth=5
bin(x,width)=width*floor(x/width) + binwidth/2.0
plot 'file' using (bin($2, binwidth)):($4) smooth freq with boxes

得到一个基本的柱状图，但我希望每个箱子的值除以箱子的大小。我如何在gnuplot中实现这一点，或者在必要时使用外部工具？

下面是我的方法，使用以下命令从R生成n=500个随机高斯变量：

Rscript -e 'cat(rnorm(500), sep="\\n")' > rnd.dat

我使用与您定义标准化直方图完全相同的想法，其中y被定义为1/（binwidth*n），除了我使用

int

而不是

floor

，并且我没有在bin值处重新居中。简而言之，这是对演示脚本的快速改编，Janert的教科书（第257页，免费提供）中描述了类似的方法。您可以用Gnuplot附带的

demo

文件夹中的

random points

替换我的示例数据文件。注意，我们需要将点数指定为Gnuplot，作为文件中记录的不计数工具

bw1=0.1
bw2=0.3
n=500
bin(x,width)=width*int(x/width)
set xrange [-3:3]
set yrange [0:1]
tstr(n)=sprintf("Binwidth = %1.1f\n", n) 
set multiplot layout 1,2
set boxwidth bw1
plot 'rnd.dat' using (bin($1,bw1)):(1./(bw1*n)) smooth frequency with boxes t tstr(bw1)
set boxwidth bw2
plot 'rnd.dat' using (bin($1,bw2)):(1./(bw2*n)) smooth frequency with boxes t tstr(bw2)

这是结果，具有两个箱子宽度

此外，这确实是一种粗略的直方图方法，在R中可以得到更详细的解决方案。事实上，问题是如何定义一个好的仓位宽度，这个问题已经讨论过了：使用仓位规则应该不会太难实现，尽管您需要计算四分位之间的范围

下面是R将如何处理相同的数据集，使用默认选项（Sturges规则，因为在这种特殊情况下，这不会产生任何差异）和与上面使用的相同的等距bin

使用的R代码如下所示：

par(mfrow=c(1,2), las=1)
hist(rnd, main="Sturges", xlab="", ylab="", prob=TRUE)
hist(rnd, breaks=seq(-3.5,3.5,by=.1), main="Binwidth = 0.1", 
     xlab="", ylab="", prob=TRUE)

您甚至可以通过检查调用

hist（）

时返回的值来了解R是如何工作的：

也就是说，如果您愿意，您可以使用R结果来使用Gnuplot处理数据（尽管我建议直接使用R:-）。

在Gnuplot 4.4中，函数具有不同的属性，它们可以执行多个连续命令，然后返回一个值（请参阅）这意味着您实际上可以计算gnuplot文件中的点数n，而无需事先知道它。此代码针对文件“out.dat”运行，该文件包含一列：正态分布n个样本的列表：

binwidth = 0.1
set boxwidth binwidth
sum = 0

s(x)          = ((sum=sum+1), 0)
bin(x, width) = width*floor(x/width) + binwidth/2.0

plot "out.dat" u ($1):(s($1))
plot "out.dat" u (bin($1, binwidth)):(1.0/(binwidth*sum)) smooth freq w boxes

第一个plot语句读取数据文件，并为每个点增加一次sum，绘制一个零

第二个plot语句实际上使用sum的值来归一化直方图。

在gnuplot 4.6中，您可以通过

stats

命令来计算点数，该命令比

plot

更快。实际上，您不需要这样的技巧

s（x）=（（sum=sum+1），0）

，而是在运行

STATS'out.dat'u 1

后，通过变量

STATS\u记录直接计算数字。计算文件中数据点数量的另一种方法是使用系统命令。如果要打印多个文件，并且事先不知道点数，这将非常有用。我用过：
countpoints(file) = system( sprintf("grep -v '^#' %s| wc -l", file) )
file1count = countpoints (file1)
file2count = countpoints (file2)
file3count = countpoints (file3)
...

countpoints
函数可避免对以“#”开头的行进行计数。然后，您将使用前面提到的函数来绘制标准化直方图
下面是一个完整的示例：
n=100
xmin=-50.
xmax=50.
binwidth=(xmax-xmin)/n

bin(x,width)=width*floor(x/width)+width/2.0
countpoints(file) = system( sprintf("grep -v '^#' %s| wc -l", file) )

file1count = countpoints (file1)
file2count = countpoints (file2)
file3count = countpoints (file3)

plot file1 using (bin(($1),binwidth)):(1.0/(binwidth*file1count)) smooth freq with boxes,\
     file2 using (bin(($1),binwidth)):(1.0/(binwidth*file2count)) smooth freq with boxes,\
     file3 using (bin(($1),binwidth)):(1.0/(binwidth*file3count)) smooth freq with boxes
...

简单地
您可以通过将s（x）
的第二个值设置为NaN
，并将notitle
添加到第一个plot
命令中来进一步改进这一点-这样，总和将在图中完全不可见，由于gnuplot在绘图=）时忽略了NaN
值，所以请在答案周围添加一些上下文。这对提问者和读者都有帮助。该怎么说？这是对希夫奈特问题的直截了当的回答。通过使用（$4/$4）
或（1）
而不仅仅是（$4）可以获得元素的计数，而不是元素的总和。
n=100
xmin=-50.
xmax=50.
binwidth=(xmax-xmin)/n

bin(x,width)=width*floor(x/width)+width/2.0
countpoints(file) = system( sprintf("grep -v '^#' %s| wc -l", file) )

file1count = countpoints (file1)
file2count = countpoints (file2)
file3count = countpoints (file3)

plot file1 using (bin(($1),binwidth)):(1.0/(binwidth*file1count)) smooth freq with boxes,\
     file2 using (bin(($1),binwidth)):(1.0/(binwidth*file2count)) smooth freq with boxes,\
     file3 using (bin(($1),binwidth)):(1.0/(binwidth*file3count)) smooth freq with boxes
...

plot 'file' using (bin($2, binwidth)):($4/$4) smooth freq with boxes