Python 从另一个数据帧有条件地更新数据帧_Python_Pandas

Python 从另一个数据帧有条件地更新数据帧

python pandas

Python 从另一个数据帧有条件地更新数据帧,python,pandas,Python,Pandas,我有一个带有两组值的主数据帧： df1 = pd.DataFrame({'id1': [1, 1, 2, 2], 'dir1': [True, False, True, False], 'value1': [55, 40, 84, 31], 'id2': [3, 3, 4, 4], 'dir2': [True, False, False, True],

我有一个带有两组值的主数据帧：

df1 = pd.DataFrame({'id1': [1, 1, 2, 2],
               'dir1': [True, False, True, False],
               'value1': [55, 40, 84, 31],
               'id2': [3, 3, 4, 4],
               'dir2': [True, False, False, True],
               'value2': [60, 30, 7, 15]})

   id1   dir1  value1  id2   dir2  value2
0    1   True      55    3   True      60
1    1  False      40    3  False      30
2    2   True      84    4  False       7
3    2  False      31    4   True      15

然后，我有一个更新数据框，如下所示：

df2 = pd.DataFrame({'id': [1, 2, 3, 4],
               'value': [21, 22, 23, 24]})
   id  value
0   1     21
1   2     22
2   3     23
3   4     24

   id1   dir1  value1  id2   dir2  value2
0    1   True     *21    3   True     *23
1    1  False      40    3  False      30
2    2   True     *22    4  False       7
3    2  False      31    4   True     *24

df_1['value1'] = np.where(df_1['dir2'] == True, df_2['value'], df_1['value1'])

我想用df2的新值更新df1，但只在dirX为真的情况下更新。然后，数据应如下所示：

df2 = pd.DataFrame({'id': [1, 2, 3, 4],
               'value': [21, 22, 23, 24]})
   id  value
0   1     21
1   2     22
2   3     23
3   4     24

   id1   dir1  value1  id2   dir2  value2
0    1   True     *21    3   True     *23
1    1  False      40    3  False      30
2    2   True     *22    4  False       7
3    2  False      31    4   True     *24

df_1['value1'] = np.where(df_1['dir2'] == True, df_2['value'], df_1['value1'])

你知道这样的事情是否有可能吗？我尝试查看更新，但无法使其正常工作。我对python相当陌生，只在23:00时编写代码，所以可能我没有需要的那么敏锐。

尝试使用numpy中的np.where函数

也许是这样的：

df2 = pd.DataFrame({'id': [1, 2, 3, 4],
               'value': [21, 22, 23, 24]})
   id  value
0   1     21
1   2     22
2   3     23
3   4     24

   id1   dir1  value1  id2   dir2  value2
0    1   True     *21    3   True     *23
1    1  False      40    3  False      30
2    2   True     *22    4  False       7
3    2  False      31    4   True     *24

df_1['value1'] = np.where(df_1['dir2'] == True, df_2['value'], df_1['value1'])

也许您需要一些调整或合并，但我认为这将帮助您找到解决方案。

我同意泰雷兹的答案。首先，根据id1将df2与df1合并：

df = df1.merge(df2, left_on='id1', right_on='id')

然后，将基于

dir1

的

value1

替换为

value

：

df.value1 = np.where(df.dir1 == True, df.value, df.value1)

然后，删除额外的列

df = df.drop(['id', 'value'],axis=1)

然后，根据

id2

，将df2与df1合并：

df = df.merge(df2, left_on='id2', right_on='id')

进行相同的更换，但对于

value2

df.value2 = np.where(df.dir2 == True, df.value, df.value2)

然后，删除额外的列：

df = df.drop(['id', 'value'],axis=1)

生成的数据帧如下所示：

   id1   dir1  value1  id2   dir2  value2
0    1   True      21    3   True      23
1    1  False      40    3  False      30
2    2   True      22    4  False       7
3    2  False      31    4   True      24

嗨，为什么你们不循环通过df1，当dirX为真时改变值呢？我喜欢循环，但我从pandas中学到的是，循环是你们最后的选择。总是很慢而且效率很低。就像一种魅力！非常感谢你！