使用Matplotlib或Pandas在Python中绘制直方图_Python_Pandas_Matplotlib_Histogram

使用Matplotlib或Pandas在Python中绘制直方图

python pandas matplotlib

使用Matplotlib或Pandas在Python中绘制直方图,python,pandas,matplotlib,histogram,Python,Pandas,Matplotlib,Histogram,我在这个论坛上发表了不同的帖子，但我无法找到我所看到的行为的答案我有一个csv文件，它的头有许多条目，每个条目有300个点。对于每个fiel（csv文件的列），我想绘制一个直方图。x轴包含该列上的元素，y轴应具有落在每个箱子内的样本数。因为我有300个点，所有箱子中的样本总数加在一起应该是300，所以y轴应该从0到，比如说，50（只是一个例子）。然而，这些值是巨大的（400e8），这是没有意义的表格样本点mydata 1 | 250.23e-9 2 | 250.123e-9 ...

我在这个论坛上发表了不同的帖子，但我无法找到我所看到的行为的答案

我有一个csv文件，它的头有许多条目，每个条目有300个点。对于每个fiel（csv文件的列），我想绘制一个直方图。x轴包含该列上的元素，y轴应具有落在每个箱子内的样本数。因为我有300个点，所有箱子中的样本总数加在一起应该是300，所以y轴应该从0到，比如说，50（只是一个例子）。然而，这些值是巨大的（400e8），这是没有意义的

表格样本点mydata 1 | 250.23e-9 2 | 250.123e-9 ... | ... 300 | 251.34e-9

请检查下面我的代码。我正在使用pandas打开csv和Matplotlib

df=pd.read_csv("/home/pcardoso/raw_data/myData.csv") # Figure parameters figPath='/home/pcardoso/scripts/python/matplotlib/figures/' figPrefix='hist_' # Prefix to the name of the file. figSuffix='_something' # Suffix to the name of the file. figString='' # Full string passed as the figure name to be saved precision=3 num_bins = 50 columns=list(df) for fieldName in columns: vectorData=df[fieldName] # statistical data mu = np.mean(vectorData) # mean of distribution sigma = np.std(vectorData) # standard deviation of distribution # Create plot instance fig, ax = plt.subplots() # Histogram n, bins, patches = ax.hist(vectorData, num_bins, density='True',alpha=0.75,rwidth=0.9, label=fieldName) ax.legend() # Best-fit curve y=mlab.normpdf(bins, mu, sigma) ax.plot(bins, y, '--') # Setting axis names, grid and title ax.set_xlabel(fieldName) ax.set_ylabel('Number of points') ax.set_title(fieldName + ': $\mu=$' + eng_notation(mu,precision) + ', $\sigma=$' + eng_notation(sigma,precision)) ax.grid(True, alpha=0.2) fig.tight_layout() # Tweak spacing to prevent clipping of ylabel # Saving figure figString=figPrefix + fieldName +figSuffix fig.savefig(figPath + figString) plt.show() plt.close(fig) df=pd.read\u csv（“/home/pcardoso/raw\u data/myData.csv”） #图形参数 figPath='/home/pcardoso/scripts/python/matplotlib/figures/' figPrefix='hist_'#文件名的前缀。 figSuffix='u something'#文件名的后缀。 figString=''#作为要保存的地物名称传递的完整字符串精度=3 数量=50 列=列表（df）对于列中的fieldName： vectorData=df[字段名] #统计数据 mu=np.平均值（矢量数据）#分布平均值 sigma=np.std（矢量数据）#分布的标准偏差 #创建绘图实例图，ax=plt.子批次（） #直方图 n、容器，补丁=ax.hist（向量数据，数量容器，密度=True'，alpha=0.75，rwidth=0.9，label=fieldName） ax.图例（） #最佳拟合曲线 y=mlab.normpdf（料仓、mu、西格玛） ax.绘图（箱，y，'-'） #设置轴名称、网格和标题 ax.set\u xlabel（字段名） ax.set_ylabel（'点数'） ax.set_title（fieldName+'：$\mu=$'+eng_表示法（mu，精度）+'，$\sigma=$'+eng_表示法（sigma，精度））轴网格（真，α=0.2）图紧_布局（）#调整间距以防止剪裁标签 #储蓄数字 figString=figPrefix+fieldName+figSuffix 图savefig（figPath+figString） plt.show（） plt.关闭（图）总之，我想知道如何正确使用y轴值

编辑：2020年7月6日

2020年6月8日编辑我希望密度估计器遵循如下图：

提前谢谢。顺致敬意，

Pedro

不要使用

density='True'

，与该选项一样，显示的值是存储箱中的成员除以存储箱的宽度。如果该宽度很小（如您的

-值很小的情况），则值会变大

编辑： 好的，要取消赋范曲线的赋范，你需要将它乘以点的数量和一个箱子的宽度。我做了一个更简化的例子：

from numpy.random import normal
from scipy.stats import norm
import pylab

N = 300
sigma = 10.0
B = 30

def main():
    x = normal(0, sigma, N)

    h, bins, _ = pylab.hist(x, bins=B, rwidth=0.8)
    bin_width = bins[1] - bins[0]

    h_n = norm.pdf(bins[:-1], 0, sigma) * N * bin_width
    pylab.plot(bins[:-1], h_n)

if __name__ == "__main__":
    main()

这回答了你的问题吗？谢谢你的问题。我已经看到了这篇文章，但它没有达到我想要的效果。无论如何，谢谢。-）谢谢你的回答，解决了这个问题。奇怪的是，将密度设置为“False”没有任何作用。但现在，我如何在直方图的顶部绘制密度曲线呢。按照我在代码中的方式，它将使用相同的巨大规模。我怎样才能强制直方图和密度图使用相同的比例呢？哈哈，这本身就是一个缺陷：

density='True'

是巧合，因为字符串

'True'

不是空的，并将

True

转换为

boolean

，但是

'False'

或

density='Bazinga'

也是如此。尝试

density=False

。密度是根据定义的，这样直方图下面的表面就是一个。将密度划分为类似比例的唯一方法是将

-轴标准化，即将所有

-值除以间隔

max（x）-min（x）

。嗨，我想我期望的是更多的包络线，而不是密度图。现在我编辑了关于如何取消赋范曲线的答案。缩放方法保持不变，即使有任何其他分布。但是，如果您将自己的大象与数据相匹配，则根本不需要进行规范化。您可以将任何曲线拟合到直方图数据

bins[：-1]->h

，并将其绘制为任何其他函数。