Python 不均匀组的频率和百分比sns条形图_Python_Python 3.x_Pandas_Matplotlib_Seaborn

Python 不均匀组的频率和百分比sns条形图

python python-3.x pandas matplotlib

Python 不均匀组的频率和百分比sns条形图,python,python-3.x,pandas,matplotlib,seaborn,Python,Python 3.x,Pandas,Matplotlib,Seaborn,我试图在sns条形图中显示各组的相对百分比以及总频率。我正在比较的两组在大小上非常不同，这就是为什么我在下面的函数中按组显示百分比下面是我创建的一个示例数据框的语法，该示例数据框的相对组大小与目标分类变量（'item'）中的数据（'groups'）相似“rand”只是一个用于生成df的变量 # import pandas and seaborn import pandas as pd import seaborn as sns import numpy as np # create data

我试图在sns条形图中显示各组的相对百分比以及总频率。我正在比较的两组在大小上非常不同，这就是为什么我在下面的函数中按组显示百分比

下面是我创建的一个示例数据框的语法，该示例数据框的相对组大小与目标分类变量（'item'）中的数据（'groups'）相似“rand”只是一个用于生成df的变量

# import pandas and seaborn
import pandas as pd
import seaborn as sns
import numpy as np

# create dataframe
foobar = pd.DataFrame(np.random.randn(100, 3), columns=('groups', 'item', 'rand'))

# get relative groupsizes
for row, val in enumerate(foobar.rand) :
    if  val > -1.2 :
        foobar.loc[row, 'groups'] = 'A'
    else: 
        foobar.loc[row, 'groups'] = 'B'

    # assign categories that I am comparing graphically
    if row < 20:
        foobar.loc[row, 'item'] = 'Z'
    elif row < 40:
        foobar.loc[row, 'item'] = 'Y'
    elif row < 60:
        foobar.loc[row, 'item'] = 'X'
    elif row < 80:
        foobar.loc[row, 'item'] = 'W'
    else:
        foobar.loc[row, 'item'] = 'V'

函数和结果图如下所示：

percent_categorical('item', df=foobar, grouper='groups')

这很好，因为它允许我按组显示相对百分比。但是，我还想显示每个组的绝对数字，最好在图例中显示。在本例中，我希望它显示A组共有89名成员，B组共有11名成员

提前感谢您的帮助。

我通过拆分

groupby

操作解决了这个问题：一个用于获取您的百分比，另一个用于计算对象的数量

我将您的

percent\u categorical

功能调整如下：

def percent_categorical(item, df=IA, grouper='Active Status') :
    # plot categorical responses to an item ('column name')
    # by percent by group ('diff column name w categorical data')
    # select a data frame (default is IA)
    # 'Active Status' is default grouper

    # create groupby of item grouped by status
    groupbase = df.groupby(grouper)[item]
    # count the number of occurences
    groupcount = groupbase.count()       
    # convert to percentage by group rather than total count           
    groupper = (groupbase.value_counts(normalize=True)
                # rename column 
                .rename('percentage')
                # multiple by 100 for easier interpretation
                .mul(100)
                # change order from value to name
                .reset_index()
                .sort_values(item))

    # create plot
    fig, ax = plt.subplots()
    brplt = sns.barplot(x=item,
                         y='percentage',
                         hue=groupper,
                         data=groupper,
                         palette='RdBu',
                         ax=ax).set_xticklabels(
                                 labels = grouper[item
                                      ].value_counts().index.tolist(), rotation=90)
    # get the handles and the labels of the legend
    # these are the bars and the corresponding text in the legend
    thehandles, thelabels = ax.get_legend_handles_labels()
    # for each label, add the total number of occurences
    # you can get this from groupcount as the labels in the figure have
    # the same name as in the values in column of your df
    for counter, label in enumerate(thelabels):
        # the new label looks like this (dummy name and value)
        # 'XYZ (42)'
        thelabels[counter] = label + ' ({})'.format(groupcount[label])
    # add the new legend to the figure
    ax.legend(thehandles, thelabels)
    #show plot
    return fig, ax, brplt

要了解您的身材：

fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups')

生成的图形如下所示：

您可以根据需要更改此图例的外观，我只是添加了括号作为示例。

谢谢！在“brplt=”命令中，“grouped”的两个实例都应切换到“groupper”。有了这样的改变，这一切就完美了。谢谢你的提醒@Andrew！您可以使用Dexplot来规范化计数，而无需使用那个庞大的函数

dxp.aggplot（agg='item'，data=foobar，hue='groups'，normalize='groups'）

fig, ax, brplt = percent_categorical('item', df=foobar, grouper='groups')