Python 如何应用groupby函数获取上一个月的计数?
我有一个数据帧,如下所示:Python 如何应用groupby函数获取上一个月的计数?,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有一个数据帧,如下所示: dff = pd.DataFrame({'month': ['1','1','1','1','2','2','2','2','3','3'], 'sym': ['abc','pqr','xyz','lmn','abc','pqr','xyz','lmn','aaa','bbb'], 'count': ['10','14','25','20','34','23','43','34','10','20']}) dff = dff[
dff = pd.DataFrame({'month': ['1','1','1','1','2','2','2','2','3','3'],
'sym': ['abc','pqr','xyz','lmn','abc','pqr','xyz','lmn','aaa','bbb'],
'count': ['10','14','25','20','34','23','43','34','10','20']})
dff = dff[['sym','month','count']]
print dff
sym month count
0 abc 1 10
1 pqr 1 14
2 xyz 1 25
3 lmn 1 20
4 abc 2 34
5 pqr 2 23
6 xyz 2 43
7 lmn 2 34
8 aaa 3 10
9 bbb 3 20
def f(df):
print df
return ""
dff['pre_count'] = dff.groupby('sym').apply(f)
我想为这个数据帧创建一个名为“prev_count”的新列。要创建此新列“prev_count”,需要遵循以下规则:
- 如果上个月在特定组中不可用,则“上个月计数”值应为0
- 如果上个月在特定组中可用,则上个月计数值应为“上个月计数值”
dff = pd.DataFrame({'month': ['1','1','1','1','2','2','2','2','3','3'],
'sym': ['abc','pqr','xyz','lmn','abc','pqr','xyz','lmn','aaa','bbb'],
'count': ['10','14','25','20','34','23','43','34','10','20']})
dff = dff[['sym','month','count']]
print dff
sym month count
0 abc 1 10
1 pqr 1 14
2 xyz 1 25
3 lmn 1 20
4 abc 2 34
5 pqr 2 23
6 xyz 2 43
7 lmn 2 34
8 aaa 3 10
9 bbb 3 20
def f(df):
print df
return ""
dff['pre_count'] = dff.groupby('sym').apply(f)
但我无法理解我如何能跟踪上个月的盘点值。他们有没有办法对数据进行这样的操作
预期输出:
sym month count prev_count
0 abc 1 10 0
1 pqr 1 14 0
2 xyz 1 25 0
3 lmn 1 20 0
4 abc 2 34 10
5 pqr 2 23 14
6 xyz 2 43 25
7 lmn 2 34 20
8 aaa 3 10 0
9 bbb 3 20 0
因为
月份
已经排序。对sym
组使用shift()和fillna(0)
In [2878]: dff['prev_count'] = dff.groupby('sym')['count'].shift().fillna(0)
In [2879]: dff
Out[2879]:
sym month count prev_count
0 abc 1 10 0
1 pqr 1 14 0
2 xyz 1 25 0
3 lmn 1 20 0
4 abc 2 34 10
5 pqr 2 23 14
6 xyz 2 43 25
7 lmn 2 34 20
8 aaa 3 10 0
9 bbb 3 20 0
或者,transform
In [2880]: dff.groupby('sym')['count'].transform(lambda x: x.shift(1)).fillna(0)
Out[2880]:
0 0
1 0
2 0
3 0
4 10
5 14
6 25
7 20
8 0
9 0
Name: count, dtype: object
@约翰加尔特-好的thanks@JohnGalt-你能详细解释一下它是如何考虑轮班日期的吗。我没有完全理解。