Python 从日期列表创建日期间隔字段
我有一份任意挑选日期的清单:Python 从日期列表创建日期间隔字段,python,pandas,datetime,Python,Pandas,Datetime,我有一份任意挑选日期的清单: dates = ['01/01/2017','01/30/2017','2/28/2017'] etc. 我还有一个带有“transaction_dt”字段的熊猫数据帧: 'customer_id','transaction_dt','product','price','units' 1,2004-01-02,thing1,25,47 1,2004-01-17,thing2,150,8 2,2004-01-29,thing2,150,25 3,2017-07-15
dates = ['01/01/2017','01/30/2017','2/28/2017'] etc.
我还有一个带有“transaction_dt”字段的熊猫数据帧:
'customer_id','transaction_dt','product','price','units'
1,2004-01-02,thing1,25,47
1,2004-01-17,thing2,150,8
2,2004-01-29,thing2,150,25
3,2017-07-15,thing3,55,17
3,2016-05-12,thing3,55,47
4,2012-02-23,thing2,150,22
4,2009-10-10,thing1,25,12
4,2014-04-04,thing2,150,2
5,2008-07-09,thing2,150,43
我想做的是创建一个函数/apply或lambda来比较事务\u dt字段值和日期列表值,然后为“事务\u dt”之间的间隔创建两个名为“开始\u dt”和“结束\u dt”的新字段
编辑:
实际上,我认为为这个应用程序提供两个单独的列表可能会使事情变得更加简单和灵活:
start_dates = ['2004-01-01','2004-01-31','2004-03-01','2004-03-31']
end_dates = ['2004-01-30','2004-02-29','2004-03-30','2004-4-29']
使用pandas和itertools(在未找到真实间隔时将产生NaT): 解决您的编辑问题 您希望进行以下更改:
# Don't need to define `window` or import islice
# Consider using `pd.date_range()` here
# Also may want to confirm that no dates overlap
# and estbalish some logic on what constitutes "falls in."
# ("On or after" versus "after")
start_dates = pd.to_datetime(start_dates)
end_dates = pd.to_datetime(end_dates)
def _between_dates(tgt, date1, date2):
# Better form to define with `def` than `lambda` here
return date1 < tgt < date2
def between_dates(tgt, start_dates, end_dates):
res = [np.nan] * 2 # default
for d1, d2 in zip(start_dates, end_dates):
if _between_dates(tgt, d1, d2):
res = [d1, d2]
return res
between = pd.DataFrame(df.transaction_dt.apply(
lambda x: between_dates(x, start_dates, end_dates)).values.tolist(),
columns=['begin_dt', 'end_dt'])
#不需要定义'window'或导入islice
考虑使用'Pd.DeaEngRange]()
#还可能需要确认没有日期重叠
#以及estbalish关于什么构成“跌倒”的一些逻辑
#(“在”或“之后”对“之后”)
开始日期=pd.to\U日期时间(开始日期)
结束日期=pd.to\U日期时间(结束日期)
日期之间的定义(tgt,日期1,日期2):
#在这里用'def'定义比用'lambda'定义更好的形式
返回日期1
非常感谢。这太棒了。我已经对我的问题进行了编辑,我认为这将使这更容易和更灵活。如果你不介意看一看的话?
customer_id transaction_dt product price units begin_dt end_dt
0 1 2004-01-02 thing1 25 47 2004-01-01 2004-01-30
1 1 2004-01-17 thing2 150 8 2004-01-01 2004-01-30
2 2 2004-01-29 thing2 150 25 2004-01-01 2004-01-30
3 3 2017-07-15 thing3 55 17 NaT NaT
4 3 2016-05-12 thing3 55 47 NaT NaT
5 4 2012-02-23 thing2 150 22 NaT NaT
6 4 2009-10-10 thing1 25 12 NaT NaT
7 4 2014-04-04 thing2 150 2 NaT NaT
8 5 2008-07-09 thing2 150 43 NaT NaT
# Don't need to define `window` or import islice
# Consider using `pd.date_range()` here
# Also may want to confirm that no dates overlap
# and estbalish some logic on what constitutes "falls in."
# ("On or after" versus "after")
start_dates = pd.to_datetime(start_dates)
end_dates = pd.to_datetime(end_dates)
def _between_dates(tgt, date1, date2):
# Better form to define with `def` than `lambda` here
return date1 < tgt < date2
def between_dates(tgt, start_dates, end_dates):
res = [np.nan] * 2 # default
for d1, d2 in zip(start_dates, end_dates):
if _between_dates(tgt, d1, d2):
res = [d1, d2]
return res
between = pd.DataFrame(df.transaction_dt.apply(
lambda x: between_dates(x, start_dates, end_dates)).values.tolist(),
columns=['begin_dt', 'end_dt'])