Python 使用基于多个条件的值向dataframe添加新列_Python_Pandas_Dataframe

Python 使用基于多个条件的值向dataframe添加新列

python pandas dataframe

Python 使用基于多个条件的值向dataframe添加新列,python,pandas,dataframe,Python,Pandas,Dataframe,我的数据框为df1，我需要根据以下条件用值填充新列：如果'name'等于'last\u name'，则'salary'+'other' 如果'last\u name'为null，则'salary'+'other' 如果'name'不等于'last\u name'，则（'rate'*'other'）+'salary' 我尝试了以下代码，但结果不正确： df = pd.DataFrame({'salary': [2000,5000,7000, 3500, 8000],'rate':[2,4,

我的数据框为

df1

，我需要根据以下条件用值填充新列：

如果

'name'

等于

'last\u name'

，则

'salary'+'other'

如果

'last\u name'

为

null

，则

'salary'+'other'

如果

'name'

不等于

'last\u name'

，则

（'rate'*'other'）+'salary'

我尝试了以下代码，但结果不正确：

df = pd.DataFrame({'salary': [2000,5000,7000, 3500, 8000],'rate':[2,4,6.5,7,5],'other':[4000,2500,4200, 5000,3000],
                'name':['bob','sam','ram','jam','flu'], 'last_name' :['bob','gan','ram', np.nan, 'flu' ]})

您可以使用数据帧过滤一次完成这些操作。当您执行类似于

df[“name”]==df[“last_name”]

的操作时，您将创建一个布尔序列（称为“掩码”），然后可以使用它索引到数据帧中

#条件1-name==姓氏
name_equals_lastname=df[“name”]==df[“last_name”]#首先，创建布尔掩码
df.loc[name_equals_lastname，“new_col”]=df[“salary”]+df[“other”]#然后，使用掩码在正确的位置索引到数据帧中，只需设置这些值
#条件2-姓氏为空
last_name_is_null=df[“last_name”]。isnull（）
df.loc[姓氏为空，“新列”]=df[“工资”]+df[“其他”]
#条件3-名称！=姓
姓名不等于姓氏=df[“姓名”！=df[“姓氏”]
df.loc[姓名不等于姓氏，“新列”]=（df[“费率”]*df[“其他”]）+df[“工资”]

您还可以将

df.apply（）

与自定义函数一起使用，如下所示：

def my_逻辑（行）：
如果行[“名称”]=行[“姓氏”]：
返回行[“薪资”]+行[“其他”]
埃利夫…#您可以在这里填写其余的逻辑
df[“new_col”]=df.apply（我的逻辑，axis=1）#需要axis=1来传递行而不是列

根据您的条件，您不需要if-else。只需将

np.where

与组合布尔掩码一起使用即可

if np.where(df["name"] == df["last_name"]) is True:
    df['new_col'] = df['salary'] + df['other']
else:
    df['new_col'] = (df['rate'] * df['other']) + df['salary']

谢谢但是我们没有使用if语句的其他解决方案吗..只是asking@mathew--我建议使用“if”语句的唯一解决方案是使用您的逻辑编写函数，然后将其传递给

df.apply（my_func）

（其中my_func包含您的逻辑）。虽然可能还有另一个if/else解决方案，但熊猫并不是真的要这样使用的。不过，我现在将把

df.apply（）

方法添加到我的解决方案中，这样您就可以看到它了。@Andy..我在您的建议中发现以下错误..AttributeError:'Series'对象没有属性'isna'。您似乎有一个旧版本的熊猫。在这种情况下，请尝试使用

isnull

作为

c2=df[“姓氏”]。isnull（）
c1 = df["name"] == df["last_name"]
c2 = df["last_name"].isna()

df['new_col'] = np.where(c1 | c2,
                         df['salary'] + df['other'],
                         df['rate'] * df['other'] + df['salary'])

Out[159]:
   salary  rate  other name last_name  new_col
0    2000   2.0   4000  bob       bob   6000.0
1    5000   4.0   2500  sam       gan  15000.0
2    7000   6.5   4200  ram       ram  11200.0
3    3500   7.0   5000  jam       NaN   8500.0
4    8000   5.0   3000  flu       flu  11000.0