Python-基于2个标准(案例和时间序列)组合2个数据库
我想使用以下数据库(1)和(2)将两个数据帧组合在一起。 预期结果如(3)所示 1)Python-基于2个标准(案例和时间序列)组合2个数据库,python,Python,我想使用以下数据库(1)和(2)将两个数据帧组合在一起。 预期结果如(3)所示 1) (二) 3) 决赛 以下是我正在寻找的标准: 1.)根据“案例”进行匹配 2.“Date_Y”必须=
(二) 3) 决赛 以下是我正在寻找的标准: 1.)根据“案例”进行匹配 2.“Date_Y”必须=<“Date_X”,并且必须是“case”数据框中的最大日期 3.)显示与日期相对应的值 我试着寻找类似的代码,但找不到 提前感谢您的帮助。以下是您想要的:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'case':[123,123,345,456],'date':['2019-12-21','2019-12-17','2019-12-21','2019-12-21']})
df2 = pd.DataFrame({'case':[123,123,123],'date':['2019-12-21','2019-12-18','2019-12-15'],'value':[0.4,0.5,1.2]})
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])
datey = []
val = []
for i in range(len(df1)):
#checking case
tmp=df2[df2['case']==df1['case'][i]]
#comparing date
tmp = tmp[tmp['date']<=df1['date'][i]]
if(len(tmp)>0):
tmp = tmp.loc[tmp['date'].idxmax()]
#appending to lists
datey.append(tmp.date)
val.append(tmp.value)
else:
datey.append(None)
val.append(None)
df1['date_y'] = datey
df1['value'] = val
print(df1)
您可以删除Nan值。以下是您想要的:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'case':[123,123,345,456],'date':['2019-12-21','2019-12-17','2019-12-21','2019-12-21']})
df2 = pd.DataFrame({'case':[123,123,123],'date':['2019-12-21','2019-12-18','2019-12-15'],'value':[0.4,0.5,1.2]})
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])
datey = []
val = []
for i in range(len(df1)):
#checking case
tmp=df2[df2['case']==df1['case'][i]]
#comparing date
tmp = tmp[tmp['date']<=df1['date'][i]]
if(len(tmp)>0):
tmp = tmp.loc[tmp['date'].idxmax()]
#appending to lists
datey.append(tmp.date)
val.append(tmp.value)
else:
datey.append(None)
val.append(None)
df1['date_y'] = datey
df1['value'] = val
print(df1)
您可以删除Nan值。是否使用熊猫?你必须这么做吗?你的数据集有多大?到目前为止你尝试了什么?检查我上传的解决方案。你使用熊猫吗?你必须这么做吗?你的数据集有多大?到目前为止你尝试了什么?检查我上传的解决方案。
Case Date_X Date_Y Value
123 2019-12-21 2019-12-20 0.4
123 2019-12-16 2019-12-14 1.2
234 2019-12-21
345 2019-12-21
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'case':[123,123,345,456],'date':['2019-12-21','2019-12-17','2019-12-21','2019-12-21']})
df2 = pd.DataFrame({'case':[123,123,123],'date':['2019-12-21','2019-12-18','2019-12-15'],'value':[0.4,0.5,1.2]})
df1['date'] = pd.to_datetime(df1['date'])
df2['date'] = pd.to_datetime(df2['date'])
datey = []
val = []
for i in range(len(df1)):
#checking case
tmp=df2[df2['case']==df1['case'][i]]
#comparing date
tmp = tmp[tmp['date']<=df1['date'][i]]
if(len(tmp)>0):
tmp = tmp.loc[tmp['date'].idxmax()]
#appending to lists
datey.append(tmp.date)
val.append(tmp.value)
else:
datey.append(None)
val.append(None)
df1['date_y'] = datey
df1['value'] = val
print(df1)
case date date_y value
0 123 2019-12-21 2019-12-21 0.4
1 123 2019-12-17 2019-12-15 1.2
2 345 2019-12-21 NaT NaN
3 456 2019-12-21 NaT NaN