Python 按匹配日期将2个数据框列合并为一个
df df1 如果Python 按匹配日期将2个数据框列合并为一个,python,pandas,numpy,dataframe,merge,Python,Pandas,Numpy,Dataframe,Merge,df df1 如果df1['data']值不是NaN,我想用df1['data']中的值替换df中的数据 预期成果: Date data 0 2020-01-04 NaN 1 2020-01-07 NaN 2 2020-01-08 19.0 3 2020-01-09 NaN 4 2020-01-11 NaN 5 2020-01-12 NaN 6 2020-01-16 NaN 7 2020-01-17 NaN 8 2020
df1['data']
值不是NaN,我想用df1['data']
中的值替换df
中的数据
预期成果:
Date data
0 2020-01-04 NaN
1 2020-01-07 NaN
2 2020-01-08 19.0
3 2020-01-09 NaN
4 2020-01-11 NaN
5 2020-01-12 NaN
6 2020-01-16 NaN
7 2020-01-17 NaN
8 2020-01-24 18.5
与我的问题类似,但情况并不完全相同
我试过:
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 19.0 2020-01-08 # This row changed
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
它返回:
pd.merge(df, df1, how='left', on='Date')
更新:
尝试:
Id timestamp data_x Date data_y
0 30665 2020-01-04 19:40:23.827 17.5 2020-01-04 NaN
1 31295 2020-01-05 22:26:39.860 17.0 2020-01-05 NaN
2 32391 2020-01-06 23:00:14.607 18.0 2020-01-06 NaN
3 33477 2020-01-07 22:52:56.757 18.0 2020-01-07 NaN
4 34555 2020-01-08 20:45:48.927 18.0 2020-01-08 19.0
5 35833 2020-01-09 20:56:21.320 18.0 2020-01-09 NaN
6 36769 2020-01-10 20:41:36.323 19.5 2020-01-10 NaN
7 37295 2020-01-11 19:35:50.553 18.5 2020-01-11 NaN
但数据
列似乎有问题:
df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['Date'])
首先使用byDate
列,如果没有匹配的缺失值,则将数据替换为原始数据:
详细信息:
df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['data'])
print (df)
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 19.0 2020-01-08
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
你知道更新问题中的数据列发生了什么事吗?@nilsinelabre-当然,有输入错误,.fillna(df['Date'])
需要。fillna(df['data'])
@nilsinelabre-原因是日期时间被转换为本机格式,unix格式(如),因此,在此格式中按日期时间重新计算缺少的值为什么必须始终设置日期索引(df1.set_index('date')
)?这是否意味着许多函数都是基于索引的?@nilsinelabore的理由是,如果需要以相同的方式匹配两个数据帧之间的值。这里需要通过Date
s进行匹配。函数映射工作获取索引系列的值,如字典的键,并用于分配新值。如果没有设置索引,则df1['data']
具有索引0,1,2…
,并且由于该值不在Date
列中,因此在之后的列中获取NaN(df['Date'].map(df1['data'])
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 1.578096e+18 2020-01-04
31054 31295 2020-01-05 22:26:39.860 1.578182e+18 2020-01-05
32150 32391 2020-01-06 23:00:14.607 1.578269e+18 2020-01-06
33236 33477 2020-01-07 22:52:56.757 1.578355e+18 2020-01-07
34314 34555 2020-01-08 20:45:48.927 1.900000e+01 2020-01-08
35592 35833 2020-01-09 20:56:21.320 1.578528e+18 2020-01-09
36528 36769 2020-01-10 20:41:36.323 1.578614e+18 2020-01-10
df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['data'])
print (df)
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 19.0 2020-01-08
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
print (df['Date'].map(df1.set_index('Date')['data']))
30424 NaN
31054 NaN
32150 NaN
33236 NaN
34314 19.0
35592 NaN
36528 NaN
37054 NaN
37652 NaN
38828 NaN
40004 NaN
Name: Date, dtype: float64