Python 获取条形图matplotlib的平均值/标准_Python_Pandas_Dataframe_Matplotlib

Python 获取条形图matplotlib的平均值/标准

python pandas dataframe matplotlib

Python 获取条形图matplotlib的平均值/标准,python,pandas,dataframe,matplotlib,Python,Pandas,Dataframe,Matplotlib,我有一个条形图，我想知道它的意思。我尝试创建一个字典，作为笔划计数（x）和相对频率的“函数”： fx_temp = df.iloc[:, [0,3]].copy() fx = list(zip(fx_temp["Stroke Count"],fx_temp["Relative Frequency"])) fx [(8, 11.762452145035647), (6, 9.873249013337801), ... df的结构如下所示： Strok

我有一个条形图，我想知道它的意思。我尝试创建一个字典，作为笔划计数（x）和相对频率的“函数”：

fx_temp = df.iloc[:, [0,3]].copy()
fx = list(zip(fx_temp["Stroke Count"],fx_temp["Relative Frequency"]))
fx
[(8, 11.762452145035647),
 (6, 9.873249013337801),
...

df的结构如下所示：

  Stroke Count  Most Common Char    Total Frequency Relative Frequency
2       8              物               2387272         11.762452
1       6              字               2003845      9.873249
13      9              音               1812762      8.931754
6       5              用               1697177      8.362249
10      10             家               1604956      7.907862
...

图形生成代码：

x = df["Stroke Count"]
y = df["Relative Frequency"]
fig, ax = plt.subplots(figsize=(20,10))
rects = ax.bar(x = x,height = y,color = my_colors,width = 0.8,tick_label=x)
ax.tick_params(labelrotation=45)
ax.set_ylabel("Relative Frequency (%)")
ax.set_xlabel("Stroke Count (int)")
ax.set_title("Stroke Counts and Relative Frequency")

def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        ax.annotate('{}'.format(d[height]),
                        xy=(rect.get_x() + rect.get_width() / 2, height),
                        xytext=(0, 3),  # 3 points vertical offset
                        textcoords="offset points",
                        ha='center', va='bottom')
autolabel(rects)
plt.margins(0.05, 0.2)

如果需要其他信息，请告诉我。我也试过x和y上的.mean（）。通过对图像进行“眼球碰撞”，我认为平均值应该在8-9英寸左右。但是，我无法得到接近该值的值。我还想从这个分布中得到标准偏差。

您试图找到直方图的平均值和标准偏差。如果您有原始数据，其中每个单词由一行和一个笔划计数表示，您可以通过以下方式完成此操作：

df['Stroke Count'].mean()

及

但是，您有一个聚合表，它可以很好地绘制直方图，但在计算统计数据时不够直观。这仍然可以通过加权每个条/类别，将它们相加并找到平均值来实现

total_strokes = df['Stroke Count'].multiply(df['Total Frequency'], axis=0).sum()
total_words = df['Total Frequency'].sum()
average_strokes = 1.*total_strokes / total_words

sum_of_squares = (df['Stroke Count'] - average_strokes).pow(2).multiply(df['Total Frequency'], axis=0).sum()
standard_deviation = (sum_of_squares  / total_words)**0.5

要计算标准差，你可以用老式的方法，从平均值开始计算平均方差的平方根

total_strokes = df['Stroke Count'].multiply(df['Total Frequency'], axis=0).sum()
total_words = df['Total Frequency'].sum()
average_strokes = 1.*total_strokes / total_words

sum_of_squares = (df['Stroke Count'] - average_strokes).pow(2).multiply(df['Total Frequency'], axis=0).sum()
standard_deviation = (sum_of_squares  / total_words)**0.5

祝你好运

你是在寻找平均笔划计数还是平均频率？我想要的是离散分布的平均值，其中x=笔划计数，fx=频率。一般来说，我知道如何计算这种分布的平均值，应该是sigma（x*fx）/len（x），但我想知道是否有更好的pandas/matplotlib方法。太好了！谢谢-我不确定如何用df应用公式，但这似乎很有效。所以“平方和”是否返回一个系列-我是否应该将系列中的所有这些值相加，然后通过将总和除以总单词并取平方根得到标准偏差？平方和返回一个值，该值等于每个值与平均值的平方差之和。我们可以用这个数字除以总单词数来求平均差。为了得到标准偏差，我们取这个值的平方根。