Python 计算dataframe中每个列值的订单百分比
我的数据如下:Python 计算dataframe中每个列值的订单百分比,python,pandas,Python,Pandas,我的数据如下: d = { 'date' : ['2011-01-01', '2011-01-15', '2011-08-14', '2012-01-01', '2012-06-06', '2013-01-01', '2013-02-01','2013-03-01','2013-04-01', '2013-08-25'] ,'year' : ['2011','2011','2011','2012','2012','2013','2013','2013','2013', '2013
d = {
'date' : ['2011-01-01', '2011-01-15', '2011-08-14', '2012-01-01', '2012-06-06', '2013-01-01', '2013-02-01','2013-03-01','2013-04-01', '2013-08-25']
,'year' : ['2011','2011','2011','2012','2012','2013','2013','2013','2013', '2013']
}
df = pd.DataFrame(d)
df['date'] = pd.to_datetime(df['date'])
df.sort_values('date', inplace= True)
date year
0 2011-01-01 2011
1 2011-01-15 2011
2 2011-08-14 2011
3 2012-01-01 2012
4 2012-06-06 2012
5 2013-01-01 2013
date year percent
0 2011-01-01 2011 0.00
1 2011-01-15 2011 0.50
2 2011-08-14 2011 1.00
3 2012-01-01 2012 0.00
4 2012-06-06 2012 1.00
5 2013-01-01 2013 0.00
6 2013-02-01 2013 0.25
7 2013-03-01 2013 0.50
8 2013-04-01 2013 0.75
9 2013-08-25 2013 1.00
如果一年的第一次出现是0.0,最后一次出现是1.0,我如何为每年创建订单百分比
输出需要如下所示:
d = {
'date' : ['2011-01-01', '2011-01-15', '2011-08-14', '2012-01-01', '2012-06-06', '2013-01-01', '2013-02-01','2013-03-01','2013-04-01', '2013-08-25']
,'year' : ['2011','2011','2011','2012','2012','2013','2013','2013','2013', '2013']
}
df = pd.DataFrame(d)
df['date'] = pd.to_datetime(df['date'])
df.sort_values('date', inplace= True)
date year
0 2011-01-01 2011
1 2011-01-15 2011
2 2011-08-14 2011
3 2012-01-01 2012
4 2012-06-06 2012
5 2013-01-01 2013
date year percent
0 2011-01-01 2011 0.00
1 2011-01-15 2011 0.50
2 2011-08-14 2011 1.00
3 2012-01-01 2012 0.00
4 2012-06-06 2012 1.00
5 2013-01-01 2013 0.00
6 2013-02-01 2013 0.25
7 2013-03-01 2013 0.50
8 2013-04-01 2013 0.75
9 2013-08-25 2013 1.00
我可以通过每年创建几个单独的数据帧来实现这一点,并应用函数,将每个索引除以
len(serie)
,但由于创建的数据帧的数量,这似乎没有效率。您需要使用groupby
并计算(1)cumcount
,以及(2)大小
,然后将两者分开
grp = df.groupby('year')
df['percent'] = grp.cumcount() / (grp['year'].transform('size') - 1)
df
date year percent
0 2011-01-01 2011 0.00
1 2011-01-15 2011 0.50
2 2011-08-14 2011 1.00
3 2012-01-01 2012 0.00
4 2012-06-06 2012 1.00
5 2013-01-01 2013 0.00
6 2013-02-01 2013 0.25
7 2013-03-01 2013 0.50
8 2013-04-01 2013 0.75
9 2013-08-25 2013 1.00