Pandas 从表中某列中出现的特定值中选择所有前6个月的数据记录
每当客户完成特定交易时,我希望为客户选择所有前6个月的记录。 数据如下所示:Pandas 从表中某列中出现的特定值中选择所有前6个月的数据记录,pandas,Pandas,每当客户完成特定交易时,我希望为客户选择所有前6个月的记录。 数据如下所示: Cust_ID Transaction_Date Amount Description 1 08/01/2017 12 Moved 1 03/01/2017 15 X 1 01/01/2017 8 Y 2 10/01/2018 6 Moved 2
Cust_ID Transaction_Date Amount Description
1 08/01/2017 12 Moved
1 03/01/2017 15 X
1 01/01/2017 8 Y
2 10/01/2018 6 Moved
2 02/01/2018 12 Z
在这里,我想查看“移动”的描述,然后为每个客户ID选择所有过去6个月
输出应如下所示:
Cust_ID Transaction_Date Amount Description
1 08/01/2017 12 Moved
1 03/01/2017 15 X
2 10/01/2018 6 Moved
我想用python来做这件事。请提供帮助。Idea是由
日期时间的系列创建的,通过移动的进行过滤,并通过移动的
进行移位,最后一次过滤的值不象下面这样偏移:
编辑:获取每个移动的值的所有日期时间:
df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'])
df = df.sort_values(['Cust_ID','Transaction_Date'])
df['g'] = df['Description'].iloc[::-1].eq('Moved').cumsum()
s = (df[df['Description'].eq('Moved')]
.set_index(['Cust_ID','g'])['Transaction_Date'] - pd.offsets.MonthOffset(6))
mask = df.join(s.rename('a'), on=['Cust_ID','g'])['a'] < df['Transaction_Date']
df1 = df[mask].drop('g', axis=1)
那么2017年1月8日<代码>08是日期,01是月份ryt?为什么在描述“移动”的预期结果集中有描述为“X”的行?您的尝试在哪里?每个组只有一个移动的
。@anky_91是的,对。这是一个不错的尝试,但在某些情况下,如果我们有多个移动的事件,它将失败。我已经更新了我的例子,当这种方法不起作用时(请参阅更新)。在我的代码示例中,我们将显示2017-03-01
事件,尽管它不在任何Moved
记录之前,但有6个月的窗口(下一个Moved
是2017-10-01
)@perl-ya,这取决于op需要什么。不幸的是,按组过滤是可能的,但速度很慢……我想我找到了一种既快速(我只需填写分组数据)又能正确处理这些情况的方法。更新了我的答案that@jezrael嘿,最后一个问题…在上面的场景中,如果我必须提取6个月的未来数据…我在“编辑”代码中需要做什么。提前谢谢。@AnkitaPatnaik-我想只需要将-pd.offset.MonthOffset(6)
更改为+pd.offset.MonthOffset(6)
print (df)
Cust_ID Transaction_Date Amount Description
0 1 10/01/2017 12 X
1 1 01/23/2017 15 Moved
2 1 03/01/2017 8 Y
3 1 08/08/2017 12 Moved
4 2 10/01/2018 6 Moved
5 2 02/01/2018 12 Z
#convert to datetimes
df['Transaction_Date'] = pd.to_datetime(df['Transaction_Date'])
#mask for filter Moved rows
mask = df['Description'].eq('Moved')
#filter and sorting this rows
df1 = df[mask].sort_values(['Cust_ID','Transaction_Date'])
print (df1)
Cust_ID Transaction_Date Amount Description
1 1 2017-01-23 15 Moved
3 1 2017-08-08 12 Moved
4 2 2018-10-01 6 Moved
#get duplicated filtered rows in df1
mask = df1.duplicated('Cust_ID')
#create Series for map
s = df1[~mask].set_index('Cust_ID')['Transaction_Date'] - pd.offsets.MonthOffset(6)
print (s)
Cust_ID
1 2016-07-23
2 2018-04-01
Name: Transaction_Date, dtype: datetime64[ns]
#create mask for filter out another Moved (get only first for each group)
m2 = ~mask.reindex(df.index, fill_value=False)
df1 = df[(df['Cust_ID'].map(s) < df['Transaction_Date']) & m2]
print (df1)
Cust_ID Transaction_Date Amount Description
0 1 2017-10-01 12 X
1 1 2017-01-23 15 Moved
2 1 2017-03-01 8 Y
4 2 2018-10-01 6 Moved
#get last duplicated filtered rows in df1
mask = df1.duplicated('Cust_ID', keep='last')
#create Series for map
s = df1[~mask].set_index('Cust_ID')['Transaction_Date']
print (s)
Cust_ID
1 2017-08-08
2 2018-10-01
Name: Transaction_Date, dtype: datetime64[ns]
m2 = ~mask.reindex(df.index, fill_value=False)
#filter by between Moved and next 6 months
df3 = df[df['Transaction_Date'].between(df['Cust_ID'].map(s), df['Cust_ID'].map(s + pd.offsets.MonthOffset(6))) & m2]
print (df3)
Cust_ID Transaction_Date Amount Description
3 1 2017-08-08 12 Moved
0 1 2017-10-01 12 X
4 2 2018-10-01 6 Moved