Python 熊猫-如何使用更新的事件重新计算预计到达时间
我有一个包含以下事件的数据框:Python 熊猫-如何使用更新的事件重新计算预计到达时间,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个包含以下事件的数据框: ID m1 m2 m3 m4 1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 4 xxxx/xxxxx.0183888975 20
ID m1 m2 m3 m4
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT
这些事件是按时间顺序发生的。这意味着:
m1
for col in df.columns:
if col != 'ID':
df[col] = pd.to_datetime(df[col], errors='coerce')
我仍然不知道masterdata是什么样子,但我假设它的行长度与原始数据帧的行长度相同。
这是我的主要数据:
master = pd.DataFrame([['xxxx/xxxxx.0183683234','2 days','9 days'],
['xxxx/xxxxx.0183679721','2 days','6 days'],
['xxxx/xxxxx.0183888975','6 days','1 day']],columns=['ID','M2_M3','M3_M4'])
out = master
out = out.merge(df, on='ID') #it will the expected output
# variables to new columns
m3_estimated = []
m4_estimated = []
# library to add days in a simple way
from datetime import timedelta
for li,m2_v in zip(out['M2_M3'].dt.days.astype('int16'),out['m2']):
if m2_v:
m3_estimated.append(m2_v + timedelta(days=li))
else:
m3_estimated.append(None)
for li,m3_v in zip(out['M3_M4'].dt.days.astype('int16'),out['m3']):
if m3_v:
m4_estimated.append(m3_v + timedelta(days=li))
else:
m4_estimated.append(None)
out['m3_estimated'] = m3_estimated
out['m4_estimated'] = m4_estimated
print(out)
ID M2_M3 M3_M4 ... m4 m3_estimated m4_estimated
0 xxxx/xxxxx.0183683234 2 days 9 days ... NaT 2019-11-30 2019-12-09
1 xxxx/xxxxx.0183679721 2 days 6 days ... NaT 2019-11-30 NaT
2 xxxx/xxxxx.0183888975 6 days 1 day ... NaT 2019-12-16 NaT
一种可能的解决办法:
df
xxxxxxxxxxID m1 m2 m3 m4 M2_M3 M3_M4
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 days 9 days
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 2 days 6 days
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT 6 days 1 days
df.dtypes
xxxxxxxxxxID object
m1 datetime64[ns]
m2 datetime64[ns]
m3 datetime64[ns]
m4 datetime64[ns]
M2_M3 timedelta64[ns]
M3_M4 timedelta64[ns]
dtype: object
#This two lines can be put in a timeloop:
df["m3_estimated"]=df.m3.where(~df.m3.isna(), df.m2.add(df.M2_M3))
df["m4_estimated"]=df.m4.where(~df.m4.isna(), df.m3_estimated.add(df.M3_M4))
df
xxxxxxxxxxID m1 m2 m3 m4 M2_M3 M3_M4 m3_estimated m4_estimated
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 days 9 days 2019-11-30 2019-12-09
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 2 days 6 days 2019-11-30 2019-12-06
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT 6 days 1 days 2019-12-16 2019-12-17
df.m4.其中(…)选择m4的值(如果它是固定的),或者使用m3_estimated和m3_m4进行计算。您的日期时间列始终是datetime吗?您的数据帧中是否有其他日期时间列?您应该将pd.Timedelta列添加到pd.datetime。也就是说,
pd.to_datetime(df['m2'])+pd.to_timedelta(df['m2\u M3'],unit='D')
如果后者是整数,或者如果不是整数,您应该将其转换为整数。您检查过这个堆栈吗?确实,timedelta函数是用于此的。但我的问题是,当实际情况不是这样时,如何计算估计值null@powerPixie是的,我检查了,但它不是我想要的,我使用它时出现了这个错误AttributeError:只能使用带字符串值的.str访问器,它在pandas中使用np.object\udtype,来自此行:对于li,m2\u v in zip(out['m2\u M3'].str.split(''),out['m2']):导入numpy作为npIt数据本身肯定有问题。我使用了您示例中可用的数据。“M2\u M3”列中的数据似乎有问题。在不知道冲突原因的情况下是很难帮助的。M2_M3的数据类型是:timedelta64[ns]和空值?我喜欢这个解决方案,它正在工作。。。但是,我必须添加.dt.normalize()以跳过时间戳值BTW,如何在timeloop中添加行?你这是什么意思?@Haalanam我的意思是定期执行包含数据重读部分和这两行的代码。
master = pd.DataFrame([['xxxx/xxxxx.0183683234','2 days','9 days'],
['xxxx/xxxxx.0183679721','2 days','6 days'],
['xxxx/xxxxx.0183888975','6 days','1 day']],columns=['ID','M2_M3','M3_M4'])
out = master
out = out.merge(df, on='ID') #it will the expected output
# variables to new columns
m3_estimated = []
m4_estimated = []
# library to add days in a simple way
from datetime import timedelta
for li,m2_v in zip(out['M2_M3'].dt.days.astype('int16'),out['m2']):
if m2_v:
m3_estimated.append(m2_v + timedelta(days=li))
else:
m3_estimated.append(None)
for li,m3_v in zip(out['M3_M4'].dt.days.astype('int16'),out['m3']):
if m3_v:
m4_estimated.append(m3_v + timedelta(days=li))
else:
m4_estimated.append(None)
out['m3_estimated'] = m3_estimated
out['m4_estimated'] = m4_estimated
print(out)
ID M2_M3 M3_M4 ... m4 m3_estimated m4_estimated
0 xxxx/xxxxx.0183683234 2 days 9 days ... NaT 2019-11-30 2019-12-09
1 xxxx/xxxxx.0183679721 2 days 6 days ... NaT 2019-11-30 NaT
2 xxxx/xxxxx.0183888975 6 days 1 day ... NaT 2019-12-16 NaT
df
xxxxxxxxxxID m1 m2 m3 m4 M2_M3 M3_M4
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 days 9 days
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 2 days 6 days
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT 6 days 1 days
df.dtypes
xxxxxxxxxxID object
m1 datetime64[ns]
m2 datetime64[ns]
m3 datetime64[ns]
m4 datetime64[ns]
M2_M3 timedelta64[ns]
M3_M4 timedelta64[ns]
dtype: object
#This two lines can be put in a timeloop:
df["m3_estimated"]=df.m3.where(~df.m3.isna(), df.m2.add(df.M2_M3))
df["m4_estimated"]=df.m4.where(~df.m4.isna(), df.m3_estimated.add(df.M3_M4))
df
xxxxxxxxxxID m1 m2 m3 m4 M2_M3 M3_M4 m3_estimated m4_estimated
1 xxxx/xxxxx.0183683234 2019-10-28 2019-11-28 2019-11-30 NaT 2 days 9 days 2019-11-30 2019-12-09
2 xxxx/xxxxx.0183679721 2019-11-28 2019-11-28 NaT NaT 2 days 6 days 2019-11-30 2019-12-06
4 xxxx/xxxxx.0183888975 2019-11-20 2019-12-10 NaT NaT 6 days 1 days 2019-12-16 2019-12-17