Pandas 基于日期和groupby筛选数据帧

Pandas 基于日期和groupby筛选数据帧,pandas,filter,pandas-groupby,Pandas,Filter,Pandas Groupby,我有以下数据帧: Date group File1 File2 Begin Date End Date 4/28/2014 A CC2015H CC2015K 5/1/2014 2/2/2015 4/29/2014 A CC2015H CC2015K 5/1/2014 2/2/2015 4/30/2014 A CC2015H CC2015K 5/1/2014 2/2/2015 5/1/2014 A CC2015H CC201

我有以下数据帧:

Date    group   File1   File2   Begin Date  End Date
4/28/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/29/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/30/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/25/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/26/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/27/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/28/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
它实际上是一个更大的数据帧,包含更多的组。为了便于展示,我把它缩短了。 我尝试按如下方式过滤日期列上的数据框:

df = df.loc[df.groupby(['group','File1', 'File2']).df['Date'] >= df.groupby(['group', 'File1', 'File2'])['Begin Date']
Date    group   File1   File2   Begin Date  End Date
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
输出应如下所示:

df = df.loc[df.groupby(['group','File1', 'File2']).df['Date'] >= df.groupby(['group', 'File1', 'File2'])['Begin Date']
Date    group   File1   File2   Begin Date  End Date
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
奖金问题:我想按开始日期和结束日期筛选,即按标准保留组

df['Date'] >= df['Begin Date'] & df['Date'] <= df['End Date']

df['Date']>=df['Begin Date']&df['Date']我认为这里没有必要使用
groupby
,因为您没有从每个组中聚合任何内容(最小值、最大值、总和、计数等)

介于
之间是您需要的:

df[df['Date'].between(df['Begin Date'], df['End Date'])]

首先,将日期列转换为datetime对象,类似于
df['date']=pd.to_datetime(df['date'])
。这一变化带来了很多选择。另外,如果你能把数据框变成一个文本而不是图片,这会更容易得到帮助。我已经按要求把它作为文本发布了。感谢您的帮助。我已尝试将其转换为datetime对象,但出现以下错误:TypeError:“>=”在“SeriesGroupBy”和“SeriesGroupBy”的实例之间不受支持