如何在python中使用pandas计算满足某些条件的日期范围内的天数
我目前有一个pandas数据框,其中每一行都有一个日期范围,我想计算该范围内符合某些条件的天数:如何在python中使用pandas计算满足某些条件的日期范围内的天数,python,python-3.x,pandas,python-2.7,date,Python,Python 3.x,Pandas,Python 2.7,Date,我目前有一个pandas数据框,其中每一行都有一个日期范围,我想计算该范围内符合某些条件的天数: Item | Date Start | Date End ---------------------- A | 02/01/2019 | 03/02/2019 B | 04/02/2019 | 08/02/2019 例如:2019年1月至2019年期间的天数或2019年期间的天数: Item | Date Start | Date End | Days in Jan-2019 |
Item | Date Start | Date End
----------------------
A | 02/01/2019 | 03/02/2019
B | 04/02/2019 | 08/02/2019
例如:2019年1月至2019年期间的天数或2019年期间的天数:
Item | Date Start | Date End | Days in Jan-2019 | Days in 2019 | Days in Q1 - 2019
------------------------------------------------------------------------------------
A | 02/01/2019 | 03/02/2019 | 30 | 33 | 33
B | 04/04/2019 | 08/04/2019 | 0 | 5 | 0
理想情况下,我不想为范围内的每个日期创建一行来进行此计算,并且可以保持行结构不变,但无法确定如何执行此操作,或者最有效的方法是什么
谢谢 这里有必要为每一对创建范围,如果可能的条件很少,则使用带条件的
sum
:
df['Date Start'] = pd.to_datetime(df['Date Start'], dayfirst=True)
df['Date End'] = pd.to_datetime(df['Date End'], dayfirst=True)
s = df.apply(lambda x: pd.date_range(x['Date Start'], x['Date End']), axis=1)
df['Days in Jan-2019'] = s.apply(lambda x: ((x.year == 2019) & (x.month == 1)).sum())
df['Days in 2019'] = s.apply(lambda x: (x.year == 2019).sum())
df['Days in Q1 2019'] = s.apply(lambda x: ((x.year == 2019) & (x.quarter == 1)).sum())
print (df)
Item Date Start Date End Days in Jan-2019 Days in 2019 Days in Q1 2019
0 A 2019-01-02 2019-02-03 30 33 33
1 B 2019-02-04 2019-02-08 0 5 5
另一个想法是按级别值展平日期范围和thn聚合sum
:
df['Date Start'] = pd.to_datetime(df['Date Start'], dayfirst=True)
df['Date End'] = pd.to_datetime(df['Date End'], dayfirst=True)
df['r'] = df.apply(lambda x: pd.date_range(x['Date Start'], x['Date End']), axis=1)
df1 = df.explode('r')
df1['Days in Jan-2019'] = (df1['r'].dt.year == 2019) & (df1['r'].dt.month == 1)
df1['Days in 2019'] = df1['r'].dt.year == 2019
df1['Days in Q1 2019'] = (df1['r'].dt.year == 2019) & (df1['r'].dt.quarter == 1)
df = df.drop('r', axis=1).join(df1.sum(level=0))
print (df)
Item Date Start Date End Days in Jan-2019 Days in 2019 Days in Q1 2019
0 A 2019-01-02 2019-02-03 30 33 33
1 B 2019-02-04 2019-02-08 0 5 5
这里有必要为每一对创建范围,如果可能的条件很少,则使用带条件的
sum
:
df['Date Start'] = pd.to_datetime(df['Date Start'], dayfirst=True)
df['Date End'] = pd.to_datetime(df['Date End'], dayfirst=True)
s = df.apply(lambda x: pd.date_range(x['Date Start'], x['Date End']), axis=1)
df['Days in Jan-2019'] = s.apply(lambda x: ((x.year == 2019) & (x.month == 1)).sum())
df['Days in 2019'] = s.apply(lambda x: (x.year == 2019).sum())
df['Days in Q1 2019'] = s.apply(lambda x: ((x.year == 2019) & (x.quarter == 1)).sum())
print (df)
Item Date Start Date End Days in Jan-2019 Days in 2019 Days in Q1 2019
0 A 2019-01-02 2019-02-03 30 33 33
1 B 2019-02-04 2019-02-08 0 5 5
另一个想法是按级别值展平日期范围和thn聚合sum
:
df['Date Start'] = pd.to_datetime(df['Date Start'], dayfirst=True)
df['Date End'] = pd.to_datetime(df['Date End'], dayfirst=True)
df['r'] = df.apply(lambda x: pd.date_range(x['Date Start'], x['Date End']), axis=1)
df1 = df.explode('r')
df1['Days in Jan-2019'] = (df1['r'].dt.year == 2019) & (df1['r'].dt.month == 1)
df1['Days in 2019'] = df1['r'].dt.year == 2019
df1['Days in Q1 2019'] = (df1['r'].dt.year == 2019) & (df1['r'].dt.quarter == 1)
df = df.drop('r', axis=1).join(df1.sum(level=0))
print (df)
Item Date Start Date End Days in Jan-2019 Days in 2019 Days in Q1 2019
0 A 2019-01-02 2019-02-03 30 33 33
1 B 2019-02-04 2019-02-08 0 5 5