在python中基于计数和类别绘制数据库_Python_Matplotlib_Seaborn

在python中基于计数和类别绘制数据库

python matplotlib

在python中基于计数和类别绘制数据库,python,matplotlib,seaborn,Python,Matplotlib,Seaborn,我在数据框中有以下数据： Customer_ID| Customer_status| store_ID| date_of_transaction 12352423| active | 65|2018/10/1 12352425| inactive | 70|2018/10/1 12352425| inactive | 65|2018/10/1 12352426| active | 75|2018/10/1 目标：查看每个商店的非活跃客户与活跃客户的分布（或平均值）。这是为了

我在数据框中有以下数据：

 Customer_ID| Customer_status| store_ID| date_of_transaction

  12352423| active | 65|2018/10/1
  12352425| inactive | 70|2018/10/1
  12352425| inactive | 65|2018/10/1
  12352426| active | 75|2018/10/1

目标：查看每个商店的非活跃客户与活跃客户的分布（或平均值）。这是为了确定是否有一些商店有更多的非活跃客户

我使用以下代码制作了一个额外的列，其中包含每个存储的计数：

df_new['Counts'] =df_customer.store_id.groupby(df_customer.store_id).transform('count')

因此，现在我有一个额外的列，其中包含每个唯一存储id的计数。例如：）存储id=65的每个条目计数列将显示32，因为存储id 65在整个数据集中出现32次

我不知道如何将其绘制成图表，以便能够直观地看到每个独特商店的不活动状态和客户状态

谢谢

要获取每个门店id的非活动平均值，您可以使用：

(df['Customer_status'] == 'inactive').groupby(df['store_ID']).mean()

输出：

store_ID
65    0.5
70    1.0
75    0.0
Name: Customer_status, dtype: float64

首先创建一个布尔序列，其中customer_status等于“inactive”，然后按store_ID对该序列进行分组，取平均值以获得平均值

绘图：

(df['Customer_status'] == 'inactive').groupby(df['store_ID']).mean().plot.bar(title='Average Inactive Customers Status by Store ID')

输出：

store_ID
65    0.5
70    1.0
75    0.0
Name: Customer_status, dtype: float64

更新以获取评论，是的，稍微重塑数据帧并打印：

df_out = df.groupby(['store_ID','Customer_status'])['Customer_ID'].count().unstack() 
df_out.div(df_out.sum(1), axis=0).plot.bar(title='Average Custome Status by Store ID')

输出：

store_ID
65    0.5
70    1.0
75    0.0
Name: Customer_status, dtype: float64

为什么不：

df.groupby（df['store\u ID'，'Customer\u status']）。mean（）

然后对您想要的任何其他统计数据重复此操作并合并数据帧。

太棒了，谢谢！有没有一种方法可以在同一个绘图中同时查看非活动和活动？