Python 基于其他列的值有条件地更改序列的值

Python 基于其他列的值有条件地更改序列的值,python,pandas,dataframe,Python,Pandas,Dataframe,我正在体验/学习具有以下结构的数据帧的Python: df = pd.DataFrame({"left_color" : ["red", "green", "blue", "black", "white", ""], "right_color" : ["red", "gray", "", "black", "red", ""], "flag" : [1, 2, 3, 1, 2, 3]}) print(df

我正在体验/学习具有以下结构的数据帧的Python:

df = pd.DataFrame({"left_color"  : ["red", "green", "blue", "black", "white", ""],
                   "right_color" : ["red", "gray", "", "black", "red", ""],
                    "flag"       : [1, 2, 3, 1, 2, 3]})
print(df)

  left_color right_color  flag
0        red         red     1
1      green        gray     2
2       blue                 3
3      black       black     1
4      white         red     2
5                            3
我的目标是根据left_color和right_color列的值有条件地更改标志系列的值。具体而言:

如果缺少left_color或right_color,则将标志值更改为numpy NaN; 如果左侧颜色与右侧颜色不同,请将标志值更改为0。 以下是我的尝试:

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        value = np.nan
    if left_side != right_side:
        value = 0
正如你所看到的,我得到的结果不是我最初描述的结果。相反,我到处都找不到价值观。以下是我想要的结果:

  left_color right_color  flag
0        red         red     1
1      green        gray     0
2       blue               NaN
3      black       black     1
4      white         red     0
5                          NaN

我想了解我的错误是什么以及如何纠正。此外,我想看看是否有一种更具python风格的方法来解决这个问题,这种方法在计算上更有效

您忘记在函数中返回值

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        return np.nan
    elif left_side != right_side:
        return 0
    else:
        return value
df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )

您忘记在函数中返回值

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        return np.nan
    elif left_side != right_side:
        return 0
    else:
        return value
df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )
您可以使用np.select,如下所示。我认为,这很可能比自定义函数快

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        return np.nan
    elif left_side != right_side:
        return 0
    else:
        return value
df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )
输出

您可以使用np.select,如下所示。我认为,这很可能比自定义函数快

def myfunc(left_side, right_side, value):
    if (left_side == "") | (right_side == ""):
        return np.nan
    elif left_side != right_side:
        return 0
    else:
        return value
df.flag=np.select([df.left_color=='',df.right_color=='', df.right_color!=df.left_color,df.right_color==df.left_color],[np.nan,np.nan,0,1] )
输出

您需要np.select:

输出:

  left_color right_color  flag
0        red         red   1.0
1      green        gray   0.0
2       blue               NaN
3      black       black   1.0
4      white         red   0.0
5                          NaN
您需要np.select:

输出:

  left_color right_color  flag
0        red         red   1.0
1      green        gray   0.0
2       blue               NaN
3      black       black   1.0
4      white         red   0.0
5                          NaN