R 分组和计数以获得熊猫的比率
下面是中询问的另一个关于数据帧的问题,它将从解决方案中受益。问题是 我想计算每个R 分组和计数以获得熊猫的比率,r,pandas,python,dataframe,group-by,R,Pandas,Python,Dataframe,Group By,下面是中询问的另一个关于数据帧的问题,它将从解决方案中受益。问题是 我想计算每个国家的状态出现的次数 打开和状态关闭的次数。然后 根据国家/地区计算成交率 数据: customer country closeday status 1 1 BE 2017-08-23 closed 2 2 NL 2017-08-05 open 3 3 NL 2017-08-22 closed 4 4 NL 2
国家的状态出现的次数
打开
和状态
关闭的次数
。然后
根据国家/地区计算成交率
数据:
customer country closeday status
1 1 BE 2017-08-23 closed
2 2 NL 2017-08-05 open
3 3 NL 2017-08-22 closed
4 4 NL 2017-08-26 closed
5 5 BE 2017-08-25 closed
6 6 NL 2017-08-13 open
7 7 BE 2017-08-30 closed
8 8 BE 2017-08-05 open
9 9 NL 2017-08-23 closed
这样做的目的是获得一个输出,该输出描述了open
和
关闭
状态,以及关闭比率
。这是所需的输出:
country closed open closed_ratio
BE 3 1 0.75
NL 3 2 0.60
期待你的建议
答案中包含以下解决方案。欢迎其他解决方案
df
customer country closeday status
1 1 BE 2017-08-23 closed
2 2 NL 2017-08-05 open
3 3 NL 2017-08-22 closed
4 4 NL 2017-08-26 closed
5 5 BE 2017-08-25 closed
6 6 NL 2017-08-13 open
7 7 BE 2017-08-30 closed
8 8 BE 2017-08-05 open
9 9 NL 2017-08-23 closed
应用groupby
,用size
对每组进行计数,然后unstack
第一级
df2 = df.groupby(['country', 'status']).status.size().unstack(level=1)
df2
status closed open
country
BE 3 1
NL 3 2
现在,计算关闭比率
:
df2['closed_ratio'] = df2.closed / df2.sum(1)
df2
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
这里有一些方法
1)
In [420]: (df.groupby(['country', 'status']).size().unstack()
.assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [422]: (pd.crosstab(df.country, df.status)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
2)
In [420]: (df.groupby(['country', 'status']).size().unstack()
.assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [422]: (pd.crosstab(df.country, df.status)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
3)
In [420]: (df.groupby(['country', 'status']).size().unstack()
.assign(closed_ratio=lambda x: x.closed / x.sum(1)))
Out[420]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [422]: (pd.crosstab(df.country, df.status)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[422]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
In [424]: (df.pivot_table(index='country', columns='status', aggfunc='size')
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[424]:
status closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60
4)从piRSquared借来
In [430]: (df.set_index('country').status.str.get_dummies().sum(level=0)
.assign(closed_ratio=lambda x: x.closed/x.sum(1)))
Out[430]:
closed open closed_ratio
country
BE 3 1 0.75
NL 3 2 0.60