Pandas 根据第二个数据帧中的值替换数据帧中的Nan
似乎有许多DF问题与其他DF的条件有关,但我找不到任何符合我需要的问题。两个数据帧都是小样本。它们都有数千列宽。我有一个数据帧(df1),看起来像这样Pandas 根据第二个数据帧中的值替换数据帧中的Nan,pandas,dataframe,Pandas,Dataframe,似乎有许多DF问题与其他DF的条件有关,但我找不到任何符合我需要的问题。两个数据帧都是小样本。它们都有数千列宽。我有一个数据帧(df1),看起来像这样 IBM BA CAT IBM EARN BA EARN CAT EARN Date 1/22/2018 163.13 65.94 76.50 NaN NaN NaN 1/23/2018 163.17 65.94 76.51 N
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 NaN NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN NaN
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN NaN NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
我有另一个数据帧(df2),它的列与df1中的最后三列相同,但有特定的日期
IBM EARN BA EARN CAT EARN
0 1/22/2018 2/1/2018 1/26/2018
1 10/19/2017 10/26/2017 10/25/2017
2 7/20/2017 7/27/2017 7/26/2017
3 4/20/2017 4/27/2017 4/26/2017
4 1/23/2017 1/26/2017 1/27/2017
5 10/19/2016 10/27/2016 10/26/2016
6 7/20/2016 7/28/2016 7/27/2016
我想在df1中放置一个1,在df2中有一个对应的日期。所以(部分)结果看起来像这样,但在df2中的所有日期列表中都会继续
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 **1** NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN **1**
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN **1** NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
请告诉我您是否可以提供解决方案您可以尝试此方法,因为日期是您的索引:
In [18]: df1['IBMEARN'] = np.where(df1.index.isin(df2.IBMEARN),1,0)
In [19]: df1['BAEARN'] = np.where(df1.index.isin(df2.BAEARN),1,0)
In [21]: df1['CATEARN'] = np.where(df1.index.isin(df2.CATEARN),1,0)
In [22]: df1
Out[22]:
IBM BA CAT IBMEARN BAEARN CATEARN
DATE
1/22/2018 163.13 65.94 76.50 1 0 0
1/23/2018 163.17 65.94 76.51 0 0 0
1/24/2018 167.26 67.43 79.23 0 0 0
1/25/2018 166.28 67.77 80.57 0 0 0
1/26/2018 166.58 68.37 80.87 0 0 1
1/27/2018 166.77 68.87 81.07 0 0 0
1/28/2018 167.98 68.57 81.07 0 0 0
2/1/2018 167.98 68.77 81.59 0 1 0
2/2/2018 167.98 69.07 81.87 0 0 0
对于第二个
DaatFrame
的每一列,通过替换值检查成员资格:
for col in df2.columns:
df1[col] = np.where(df1.index.isin(df2[col]),1,np.nan)
print (df1)
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 1.0 NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN 1.0
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN 1.0 NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
编辑:
通过df2
创建的列表字典使用非循环解决方案,并将布尔掩码转换为整数:
#first create DataFrame by repeat index of df1
#https://stackoverflow.com/a/45118399
arr = np.broadcast_to(df1.index[:, None], (len(df1), len(df2.columns)))
df3 = pd.DataFrame(arr, columns=df2.columns, index=df1.index)
df3 = df3.isin(df2.to_dict('l')).astype(int)
print (df3)
IBM EARN BA EARN CAT EARN
Date
1/22/2018 1 0 0
1/23/2018 0 0 0
1/24/2018 0 0 0
1/25/2018 0 0 0
1/26/2018 0 0 1
1/27/2018 0 0 0
1/28/2018 0 0 0
2/1/2018 0 1 0
2/2/2018 0 0 0
df1 = df1.drop(df2.columns, 1).join(df3)
print (df1)
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 1 0 0
1/23/2018 163.17 65.94 76.51 0 0 0
1/24/2018 167.26 67.43 79.23 0 0 0
1/25/2018 166.28 67.77 80.57 0 0 0
1/26/2018 166.58 68.37 80.87 0 0 1
1/27/2018 166.77 68.87 81.07 0 0 0
1/28/2018 167.98 68.57 81.07 0 0 0
2/1/2018 167.98 68.77 81.59 0 1 0
2/2/2018 167.98 69.07 81.87 0 0 0
请发布所需的输出,而不是部分输出。不幸的是,我没有完整的输出,因为它超过1000行,但df1有每日日期(作为索引)和股票价格,我想在df2中有日期的每个实例中都加1。因此,对于IBM来说,应该是2018年1月22日(如图所示),然后是2017年10月19日,然后是2017年7月20日,以此类推。对于其他两个具有dates@JWestwood-如果您感兴趣,我还添加了非循环解决方案。是的,这确实有效,因此它是正确的,但我有3000多列(库存)所以我想我可以循环使用它们并执行它,但我想知道是否有一种更具python风格的方式来执行所有这些列,而不是循环。如果你能想到什么,请告诉我。tyI看到@jezreal发布的循环。这似乎很简单。我认为两个答案都是正确的。如果我能记下2分,我会这样做answers@JWestwood我认为jezreal的答案更好,你应该接受。没有循环的非常好的解决方案。@shivsn-谢谢。