R 分组和计数以获得熊猫的比率_R_Pandas_Python_Dataframe_Group By

R 分组和计数以获得熊猫的比率

r pandas python dataframe

R 分组和计数以获得熊猫的比率,r,pandas,python,dataframe,group-by,R,Pandas,Python,Dataframe,Group By,下面是中询问的另一个关于数据帧的问题，它将从解决方案中受益。问题是我想计算每个国家的状态出现的次数打开和状态关闭的次数。然后根据国家/地区计算成交率数据： customer country closeday status 1 1 BE 2017-08-23 closed 2 2 NL 2017-08-05 open 3 3 NL 2017-08-22 closed 4 4 NL 2

下面是中询问的另一个关于数据帧的问题，它将从解决方案中受益。问题是

我想计算每个

国家的状态出现的次数
打开
和状态
关闭的次数。然后
根据国家/地区计算成交率

数据：
  customer country   closeday status
1        1      BE 2017-08-23 closed
2        2      NL 2017-08-05   open
3        3      NL 2017-08-22 closed
4        4      NL 2017-08-26 closed
5        5      BE 2017-08-25 closed
6        6      NL 2017-08-13   open
7        7      BE 2017-08-30 closed
8        8      BE 2017-08-05   open
9        9      NL 2017-08-23 closed

这样做的目的是获得一个输出，该输出描述了open和
关闭
状态，以及关闭比率
。这是所需的输出：
country   closed  open  closed_ratio                         
BE            3     1          0.75
NL            3     2          0.60

期待你的建议
答案中包含以下解决方案。欢迎其他解决方案
df

   customer country    closeday  status
1         1      BE  2017-08-23  closed
2         2      NL  2017-08-05    open
3         3      NL  2017-08-22  closed
4         4      NL  2017-08-26  closed
5         5      BE  2017-08-25  closed
6         6      NL  2017-08-13    open
7         7      BE  2017-08-30  closed
8         8      BE  2017-08-05    open
9         9      NL  2017-08-23  closed


应用groupby
，用size
对每组进行计数，然后unstack
第一级
df2 = df.groupby(['country', 'status']).status.size().unstack(level=1)
df2

status   closed  open
country              
BE            3     1
NL            3     2

现在，计算关闭比率
：
df2['closed_ratio'] = df2.closed / df2.sum(1)     
df2

status   closed  open  closed_ratio
country                            
BE            3     1          0.75
NL            3     2          0.60

这里有一些方法
1）
In [420]: (df.groupby(['country', 'status']).size().unstack()
             .assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [422]: (pd.crosstab(df.country, df.status)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

2）
In [420]: (df.groupby(['country', 'status']).size().unstack()
             .assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [422]: (pd.crosstab(df.country, df.status)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

3）
In [420]: (df.groupby(['country', 'status']).size().unstack()
             .assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [422]: (pd.crosstab(df.country, df.status)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status   closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60

4）从piRSquared借来
In [430]: (df.set_index('country').status.str.get_dummies().sum(level=0)
             .assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[430]:
         closed  open  closed_ratio
country
BE            3     1          0.75
NL            3     2          0.60