Pandas 根据第二个数据帧中的值替换数据帧中的Nan

Pandas 根据第二个数据帧中的值替换数据帧中的Nan,pandas,dataframe,Pandas,Dataframe,似乎有许多DF问题与其他DF的条件有关,但我找不到任何符合我需要的问题。两个数据帧都是小样本。它们都有数千列宽。我有一个数据帧(df1),看起来像这样 IBM BA CAT IBM EARN BA EARN CAT EARN Date 1/22/2018 163.13 65.94 76.50 NaN NaN NaN 1/23/2018 163.17 65.94 76.51 N

似乎有许多DF问题与其他DF的条件有关,但我找不到任何符合我需要的问题。两个数据帧都是小样本。它们都有数千列宽。我有一个数据帧(df1),看起来像这样

             IBM    BA      CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50     NaN        NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN       NaN
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        NaN       NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN
我有另一个数据帧(df2),它的列与df1中的最后三列相同,但有特定的日期

    IBM EARN    BA EARN     CAT EARN
0   1/22/2018   2/1/2018    1/26/2018
1   10/19/2017  10/26/2017  10/25/2017
2   7/20/2017   7/27/2017   7/26/2017
3   4/20/2017   4/27/2017   4/26/2017
4   1/23/2017   1/26/2017   1/27/2017
5   10/19/2016  10/27/2016  10/26/2016
6   7/20/2016   7/28/2016   7/27/2016
我想在df1中放置一个1,在df2中有一个对应的日期。所以(部分)结果看起来像这样,但在df2中的所有日期列表中都会继续

             IBM     BA     CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50    **1**       NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN        **1**
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        **1**     NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN

请告诉我您是否可以提供解决方案

您可以尝试此方法,因为日期是您的索引:

In [18]: df1['IBMEARN'] = np.where(df1.index.isin(df2.IBMEARN),1,0)

In [19]: df1['BAEARN'] = np.where(df1.index.isin(df2.BAEARN),1,0)

In [21]: df1['CATEARN'] = np.where(df1.index.isin(df2.CATEARN),1,0)
In [22]: df1
Out[22]: 
              IBM     BA    CAT  IBMEARN  BAEARN  CATEARN
DATE                                                     
1/22/2018  163.13  65.94  76.50        1       0        0
1/23/2018  163.17  65.94  76.51        0       0        0
1/24/2018  167.26  67.43  79.23        0       0        0
1/25/2018  166.28  67.77  80.57        0       0        0
1/26/2018  166.58  68.37  80.87        0       0        1
1/27/2018  166.77  68.87  81.07        0       0        0
1/28/2018  167.98  68.57  81.07        0       0        0
2/1/2018   167.98  68.77  81.59        0       1        0
2/2/2018   167.98  69.07  81.87        0       0        0

对于第二个
DaatFrame
的每一列,通过替换值检查成员资格:

for col in df2.columns:
    df1[col] = np.where(df1.index.isin(df2[col]),1,np.nan)
print (df1)
              IBM     BA    CAT  IBM EARN  BA EARN  CAT EARN
Date                                                        
1/22/2018  163.13  65.94  76.50       1.0      NaN       NaN
1/23/2018  163.17  65.94  76.51       NaN      NaN       NaN
1/24/2018  167.26  67.43  79.23       NaN      NaN       NaN
1/25/2018  166.28  67.77  80.57       NaN      NaN       NaN
1/26/2018  166.58  68.37  80.87       NaN      NaN       1.0
1/27/2018  166.77  68.87  81.07       NaN      NaN       NaN
1/28/2018  167.98  68.57  81.07       NaN      NaN       NaN
2/1/2018   167.98  68.77  81.59       NaN      1.0       NaN
2/2/2018   167.98  69.07  81.87       NaN      NaN       NaN
编辑:

通过
df2
创建的列表字典使用非循环解决方案,并将布尔掩码转换为整数:

#first create DataFrame by repeat index of df1
#https://stackoverflow.com/a/45118399
arr = np.broadcast_to(df1.index[:, None], (len(df1), len(df2.columns)))
df3 = pd.DataFrame(arr, columns=df2.columns, index=df1.index)

df3 = df3.isin(df2.to_dict('l')).astype(int)
print (df3)
           IBM EARN  BA EARN  CAT EARN
Date                                  
1/22/2018         1        0         0
1/23/2018         0        0         0
1/24/2018         0        0         0
1/25/2018         0        0         0
1/26/2018         0        0         1
1/27/2018         0        0         0
1/28/2018         0        0         0
2/1/2018          0        1         0
2/2/2018          0        0         0

df1 = df1.drop(df2.columns, 1).join(df3)
print (df1)
              IBM     BA    CAT  IBM EARN  BA EARN  CAT EARN
Date                                                        
1/22/2018  163.13  65.94  76.50         1        0         0
1/23/2018  163.17  65.94  76.51         0        0         0
1/24/2018  167.26  67.43  79.23         0        0         0
1/25/2018  166.28  67.77  80.57         0        0         0
1/26/2018  166.58  68.37  80.87         0        0         1
1/27/2018  166.77  68.87  81.07         0        0         0
1/28/2018  167.98  68.57  81.07         0        0         0
2/1/2018   167.98  68.77  81.59         0        1         0
2/2/2018   167.98  69.07  81.87         0        0         0

请发布所需的输出,而不是部分输出。不幸的是,我没有完整的输出,因为它超过1000行,但df1有每日日期(作为索引)和股票价格,我想在df2中有日期的每个实例中都加1。因此,对于IBM来说,应该是2018年1月22日(如图所示),然后是2017年10月19日,然后是2017年7月20日,以此类推。对于其他两个具有dates@JWestwood-如果您感兴趣,我还添加了非循环解决方案。是的,这确实有效,因此它是正确的,但我有3000多列(库存)所以我想我可以循环使用它们并执行它,但我想知道是否有一种更具python风格的方式来执行所有这些列,而不是循环。如果你能想到什么,请告诉我。tyI看到@jezreal发布的循环。这似乎很简单。我认为两个答案都是正确的。如果我能记下2分,我会这样做answers@JWestwood我认为jezreal的答案更好,你应该接受。没有循环的非常好的解决方案。@shivsn-谢谢。