Python 3.x 使用多个if-else基于其他列值填充列

Python 3.x 使用多个if-else基于其他列值填充列,python-3.x,pandas,Python 3.x,Pandas,我试图比较pandas数据框中的4列,并根据结果填充第5列。在普通SQL中,它是这样的: if speciality_new is null and location_new is null then 'No match found' elif specialty <> specialty_new and location <> location_new then 'both are different' elif specialty_new is null then '

我试图比较pandas数据框中的4列,并根据结果填充第5列。在普通SQL中,它是这样的:

if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'
错误消息是
TypeError:&:'str'和'str'的操作数类型不受支持
,这没有任何意义,因为'&'是'and'的语法

dfsample是我拥有的,dfFinal是我想要的

dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
       'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})

这里有另一种不用np的方法。我正在使用apply函数

import pandas as pd
import numpy as np

df = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', np.NaN],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})

print (df)

def master_check(x):
    #print (x)
    if    (pd.isnull(x['speciality_new'])) & (pd.isnull(x['location_new'])): return 'No match found'
    elif  (x['speciality'] != x['speciality_new']) & (x['location'] != x['location_new']): return 'Both specialty and location didnt match'
    elif  x['speciality'] != x['speciality_new']: return 'Specialty didnt match'
    elif  x['location'] != x['location_new']: return 'Location didnt match'
    else: return True

df['Match'] = df.apply(master_check,axis=1)
输出将是:

ID speciality location speciality_new location_new
0   1     doctor    texas         doctor        texas
1   2      nurse       dc          nurse       alaska
2   3    patient  georgia       director      georgia
3   4     driver                   nurse     maryland
4   5   director  florida            NaN      florida


ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match
   ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

如果您确实想使用<代码> NoPy.No.(),则必须将每个错误语句视为单独的<代码> NoPy.No.()< /C>。要使用

numpy.where()
实现它,您必须这样做

import pandas as pd
import numpy as np

masterDf = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
       'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
       'location': ['texas', 'dc', 'georgia', '', 'florida'],
       'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
       'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})


masterDf['Match'] = np.where(
    ((masterDf.speciality_new.isnull()) & (masterDf.location_new.isnull())), 'No match found',
    np.where(((masterDf.speciality != masterDf.speciality_new) & (masterDf.location != masterDf.location_new)), 'Both specialty and location didnt match',
    np.where((masterDf.speciality != masterDf.speciality_new), 'Specialty didnt match',
    np.where((masterDf.location != masterDf.location_new), 'Location didnt match',
    True))))

print (masterDf)
输出将是:

ID speciality location speciality_new location_new
0   1     doctor    texas         doctor        texas
1   2      nurse       dc          nurse       alaska
2   3    patient  georgia       director      georgia
3   4     driver                   nurse     maryland
4   5   director  florida            NaN      florida


ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match
   ID speciality  ... location_new                                    Match
0   1     doctor  ...        texas                                     True
1   2      nurse  ...       alaska                     Location didnt match
2   3    patient  ...      georgia                    Specialty didnt match
3   4     driver  ...     maryland  Both specialty and location didnt match
4   5   director  ...      florida                    Specialty didnt match

要使用
numpy
分析多个条件,最好使用,其中应指定条件、每个条件的预期输出和默认输出,就像if-elif-else语句一样:

将numpy导入为np
条件列表=[
dfsample['speciality\u new'].isnull()&dfsample['location\u new'].isnull(),
dfsample['speciality'].ne(dfsample['speciality_new'])和
dfsample['location'].ne(dfsample['location\u new']),
dfsample['speciality'].ne(dfsample['speciality\u new']),
dfsample['location'].ne(dfsample['location\u new']),
]
唱诗班成员=[
“未找到匹配项”,
“专业和地点都不匹配”,
“专业不匹配”,
“位置不匹配”
]
dfsample['match']=np.select(条件列表,选项列表,默认值=True)
打印(dfsample)
其中表示“不相等”(您可以简单地使用
!=


输出:

   ID speciality location speciality_new location_new                                    match
0   1     doctor    texas         doctor        texas                                     True
1   2      nurse       dc          nurse       alaska                     Location didnt match
2   3    patient  georgia       director      georgia                    Specialty didnt match
3   4     driver                   nurse     maryland  Both specialty and location didnt match
4   5   director  florida                     florida                    Specialty didnt match

'and'是and.Even'和'不起作用的语法。这就是堆栈溢出中所说的:请共享示例数据帧和预期输出。您需要使用开括号和闭括号来确保正确映射查询。@sammywemmy添加了示例数据帧以及我希望的最终输出谢谢您。选择Caina答案是因为它看起来更干净、更小。嗨@Caina,谢谢你帮助解决这个问题。我使用的是相同的代码(稍作修改),但出现了一个错误
ValueError:传递的项目数错误63311,placement暗示1
您能给出一些提示吗。63311是数据帧中的行数。代码是
condList=[dfsample['Address'].str.extract('(\d+)).isna(),dfsample['Address'].str.extract('(\d+).replace(np.nan,-1,regex=True)。astype(int)[0]。eq(dfsample['street']),]choiceList=[False,True]dfsample['match']=np.select(condList,choiceList,default=False)
来自
condList
的第一个条件返回一个data.frame,其中应为一个系列或一维numpy数组。如果在其后面放置一个
[0]
(如
dfsample['Address'].str.extract('(\d+)).isna()[0])
它可能会工作。