Python 3.x 使用多个if-else基于其他列值填充列_Python 3.x_Pandas

Python 3.x 使用多个if-else基于其他列值填充列

python-3.x pandas

Python 3.x 使用多个if-else基于其他列值填充列,python-3.x,pandas,Python 3.x,Pandas,我试图比较pandas数据框中的4列，并根据结果填充第5列。在普通SQL中，它是这样的： if speciality_new is null and location_new is null then 'No match found' elif specialty <> specialty_new and location <> location_new then 'both are different' elif specialty_new is null then '

我试图比较pandas数据框中的4列，并根据结果填充第5列。在普通SQL中，它是这样的：

if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'

错误消息是

TypeError:&:'str'和'str'的操作数类型不受支持

，这没有任何意义，因为'&'是'and'的语法

dfsample是我拥有的，dfFinal是我想要的

dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
       'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})

这里有另一种不用np的方法。我正在使用apply函数

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', np.NaN],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

print (df)

def master_check(x):
    #print (x)
    if    (pd.isnull(x['speciality_new'])) & (pd.isnull(x['location_new'])): return 'No match found'
    elif  (x['speciality'] != x['speciality_new']) & (x['location'] != x['location_new']): return 'Both specialty and location didnt match'
    elif  x['speciality'] != x['speciality_new']: return 'Specialty didnt match'
    elif  x['location'] != x['location_new']: return 'Location didnt match'
    else: return True

df['Match'] = df.apply(master_check,axis=1)

输出将是：

ID speciality location speciality_new location_new
0   1     doctor    texas         doctor        texas
1   2      nurse       dc          nurse       alaska
2   3    patient  georgia       director      georgia
3   4     driver                   nurse     maryland
4   5   director  florida            NaN      florida


ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

   ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

如果您确实想使用<代码> NoPy.No.（），则必须将每个错误语句视为单独的<代码> NoPy.No.（）< /C>。要使用

numpy.where（）

实现它，您必须这样做

import pandas as pd
import numpy as np

masterDf = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})


masterDf['Match'] = np.where(
    ((masterDf.speciality_new.isnull()) & (masterDf.location_new.isnull())), 'No match found',
    np.where(((masterDf.speciality != masterDf.speciality_new) & (masterDf.location != masterDf.location_new)), 'Both specialty and location didnt match',
    np.where((masterDf.speciality != masterDf.speciality_new), 'Specialty didnt match',
    np.where((masterDf.location != masterDf.location_new), 'Location didnt match',
    True))))

print (masterDf)

输出将是：

ID speciality location speciality_new location_new
0   1     doctor    texas         doctor        texas
1   2      nurse       dc          nurse       alaska
2   3    patient  georgia       director      georgia
3   4     driver                   nurse     maryland
4   5   director  florida            NaN      florida


ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

   ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

要使用

numpy

分析多个条件，最好使用，其中应指定条件、每个条件的预期输出和默认输出，就像if-elif-else语句一样：

将numpy导入为np
条件列表=[
dfsample['speciality\u new'].isnull（）&dfsample['location\u new'].isnull（），
dfsample['speciality'].ne（dfsample['speciality_new']）和
dfsample['location'].ne（dfsample['location\u new']），
dfsample['speciality'].ne（dfsample['speciality\u new']），
dfsample['location'].ne（dfsample['location\u new']），
]
唱诗班成员=[
“未找到匹配项”，
“专业和地点都不匹配”，
“专业不匹配”，
“位置不匹配”
]
dfsample['match']=np.select（条件列表，选项列表，默认值=True）
打印（dfsample）

其中表示“不相等”（您可以简单地使用

！=

）

输出：

   ID speciality location speciality_new location_new                                    match
0   1     doctor    texas         doctor        texas                                     True
1   2      nurse       dc          nurse       alaska                     Location didnt match
2   3    patient  georgia       director      georgia                    Specialty didnt match
3   4     driver                   nurse     maryland  Both specialty and location didnt match
4   5   director  florida                     florida                    Specialty didnt match

'and'是and.Even'和'不起作用的语法。这就是堆栈溢出中所说的：请共享示例数据帧和预期输出。您需要使用开括号和闭括号来确保正确映射查询。@sammywemmy添加了示例数据帧以及我希望的最终输出谢谢您。选择Caina答案是因为它看起来更干净、更小。嗨@Caina，谢谢你帮助解决这个问题。我使用的是相同的代码（稍作修改），但出现了一个错误

ValueError:传递的项目数错误63311，placement暗示1

您能给出一些提示吗。63311是数据帧中的行数。代码是

condList=[dfsample['Address'].str.extract（'（\d+））.isna（），dfsample['Address'].str.extract（'（\d+）.replace（np.nan，-1，regex=True）。astype（int）[0]。eq（dfsample['street']），]choiceList=[False，True]dfsample['match']=np.select（condList，choiceList，default=False）

来自

condList

的第一个条件返回一个data.frame，其中应为一个系列或一维numpy数组。如果在其后面放置一个

[0]

（如

dfsample['Address'].str.extract（'（\d+））.isna（）[0]）

它可能会工作。