Python 3.x 使用多个if-else基于其他列值填充列
我试图比较pandas数据框中的4列,并根据结果填充第5列。在普通SQL中,它是这样的:Python 3.x 使用多个if-else基于其他列值填充列,python-3.x,pandas,Python 3.x,Pandas,我试图比较pandas数据框中的4列,并根据结果填充第5列。在普通SQL中,它是这样的: if speciality_new is null and location_new is null then 'No match found' elif specialty <> specialty_new and location <> location_new then 'both are different' elif specialty_new is null then '
if speciality_new is null and location_new is null then 'No match found'
elif specialty <> specialty_new and location <> location_new then 'both are different'
elif specialty_new is null then 'specialty not found'
elif location_new is null then 'location not found'
else 'true'
错误消息是TypeError:&:'str'和'str'的操作数类型不受支持
,这没有任何意义,因为'&'是'and'的语法
dfsample是我拥有的,dfFinal是我想要的
dfsample = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
dfFinal = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida'],
'match': ['TRUE', 'location didn’t match', 'specialty didn’t match', 'both specialty and location didn’t match', 'specialty didn’t match']})
这里有另一种不用np的方法。我正在使用apply函数
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', np.NaN],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
print (df)
def master_check(x):
#print (x)
if (pd.isnull(x['speciality_new'])) & (pd.isnull(x['location_new'])): return 'No match found'
elif (x['speciality'] != x['speciality_new']) & (x['location'] != x['location_new']): return 'Both specialty and location didnt match'
elif x['speciality'] != x['speciality_new']: return 'Specialty didnt match'
elif x['location'] != x['location_new']: return 'Location didnt match'
else: return True
df['Match'] = df.apply(master_check,axis=1)
输出将是:
ID speciality location speciality_new location_new
0 1 doctor texas doctor texas
1 2 nurse dc nurse alaska
2 3 patient georgia director georgia
3 4 driver nurse maryland
4 5 director florida NaN florida
ID speciality ... location_new Match
0 1 doctor ... texas True
1 2 nurse ... alaska Location didnt match
2 3 patient ... georgia Specialty didnt match
3 4 driver ... maryland Both specialty and location didnt match
4 5 director ... florida Specialty didnt match
ID speciality ... location_new Match
0 1 doctor ... texas True
1 2 nurse ... alaska Location didnt match
2 3 patient ... georgia Specialty didnt match
3 4 driver ... maryland Both specialty and location didnt match
4 5 director ... florida Specialty didnt match
如果您确实想使用<代码> NoPy.No.(),则必须将每个错误语句视为单独的<代码> NoPy.No.()< /C>。要使用
numpy.where()
实现它,您必须这样做
import pandas as pd
import numpy as np
masterDf = pd.DataFrame({'ID': [1, 2, 3, 4, 5],
'speciality': ['doctor', 'nurse', 'patient', 'driver', 'director'],
'location': ['texas', 'dc', 'georgia', '', 'florida'],
'speciality_new' : ['doctor', 'nurse', 'director', 'nurse', ''],
'location_new': ['texas', 'alaska', 'georgia', 'maryland', 'florida']})
masterDf['Match'] = np.where(
((masterDf.speciality_new.isnull()) & (masterDf.location_new.isnull())), 'No match found',
np.where(((masterDf.speciality != masterDf.speciality_new) & (masterDf.location != masterDf.location_new)), 'Both specialty and location didnt match',
np.where((masterDf.speciality != masterDf.speciality_new), 'Specialty didnt match',
np.where((masterDf.location != masterDf.location_new), 'Location didnt match',
True))))
print (masterDf)
输出将是:
ID speciality location speciality_new location_new
0 1 doctor texas doctor texas
1 2 nurse dc nurse alaska
2 3 patient georgia director georgia
3 4 driver nurse maryland
4 5 director florida NaN florida
ID speciality ... location_new Match
0 1 doctor ... texas True
1 2 nurse ... alaska Location didnt match
2 3 patient ... georgia Specialty didnt match
3 4 driver ... maryland Both specialty and location didnt match
4 5 director ... florida Specialty didnt match
ID speciality ... location_new Match
0 1 doctor ... texas True
1 2 nurse ... alaska Location didnt match
2 3 patient ... georgia Specialty didnt match
3 4 driver ... maryland Both specialty and location didnt match
4 5 director ... florida Specialty didnt match
要使用
numpy
分析多个条件,最好使用,其中应指定条件、每个条件的预期输出和默认输出,就像if-elif-else语句一样:
将numpy导入为np
条件列表=[
dfsample['speciality\u new'].isnull()&dfsample['location\u new'].isnull(),
dfsample['speciality'].ne(dfsample['speciality_new'])和
dfsample['location'].ne(dfsample['location\u new']),
dfsample['speciality'].ne(dfsample['speciality\u new']),
dfsample['location'].ne(dfsample['location\u new']),
]
唱诗班成员=[
“未找到匹配项”,
“专业和地点都不匹配”,
“专业不匹配”,
“位置不匹配”
]
dfsample['match']=np.select(条件列表,选项列表,默认值=True)
打印(dfsample)
其中表示“不相等”(您可以简单地使用!=
)
输出:
ID speciality location speciality_new location_new match
0 1 doctor texas doctor texas True
1 2 nurse dc nurse alaska Location didnt match
2 3 patient georgia director georgia Specialty didnt match
3 4 driver nurse maryland Both specialty and location didnt match
4 5 director florida florida Specialty didnt match
'and'是and.Even'和'不起作用的语法。这就是堆栈溢出中所说的:请共享示例数据帧和预期输出。您需要使用开括号和闭括号来确保正确映射查询。@sammywemmy添加了示例数据帧以及我希望的最终输出谢谢您。选择Caina答案是因为它看起来更干净、更小。嗨@Caina,谢谢你帮助解决这个问题。我使用的是相同的代码(稍作修改),但出现了一个错误
ValueError:传递的项目数错误63311,placement暗示1
您能给出一些提示吗。63311是数据帧中的行数。代码是condList=[dfsample['Address'].str.extract('(\d+)).isna(),dfsample['Address'].str.extract('(\d+).replace(np.nan,-1,regex=True)。astype(int)[0]。eq(dfsample['street']),]choiceList=[False,True]dfsample['match']=np.select(condList,choiceList,default=False)
来自condList
的第一个条件返回一个data.frame,其中应为一个系列或一维numpy数组。如果在其后面放置一个[0]
(如dfsample['Address'].str.extract('(\d+)).isna()[0])
它可能会工作。