Python 3.x 如何在一个函数中将名称拆分为名字、中间名和姓氏,并同时分配到列中?
我喜欢使用函数将Python 3.x 如何在一个函数中将名称拆分为名字、中间名和姓氏,并同时分配到列中?,python-3.x,string,pandas,Python 3.x,String,Pandas,我喜欢使用函数将df中的name列拆分为名字、中间名和姓氏。但这给了我一个错误: ValueError: too many values to unpack (expected 3) 我的代码: df['FIRST_NAME'], df['MIDDLE_NAME'], df['LAST_NAME'] = \ df.apply(split_name, var='NAME_V2', axis=1) def split_name(df, var): first_name = ''
df
中的name
列拆分为名字、中间名和姓氏。但这给了我一个错误:
ValueError: too many values to unpack (expected 3)
我的代码:
df['FIRST_NAME'], df['MIDDLE_NAME'], df['LAST_NAME'] = \
df.apply(split_name, var='NAME_V2', axis=1)
def split_name(df, var):
first_name = ''
middle_name = ''
last_name = ''
full_name = df[var]
name_entity = full_name.split()
name_entity_length = len(name_entity)
if name_entity_length == 1:
last_name = full_name
elif name_entity_length == 2:
first_name = name_entity[0]
last_name = name_entity[-1]
elif name_entity_length >= 3:
first_name = name_entity[0]
middle_name = name_entity[1:-1]
last_name = name_entity[-1]
return (first_name, middle_name, last_name)
数据帧:
NAME_V2 FIRST_NAME MIDDLE_NAME LAST_NAME
John Smith John Smith
Smith Smith
J O I Smith J O I Smith
以下是我的方法:
def split_name(df, var):
sub_df = df[var].str.split('\\s+', expand=True)
result = []
for _, row in sub_df.iterrows():
info = {'FirstName': '', 'MiddleName': '', 'LastName': ''}
n = row.count()
if n == 0:
pass
elif n == 1:
info['LastName'] = row.iloc[0]
elif n == 2:
info['FirstName'], info['LastName'] = row.iloc[:2]
else:
info['FirstName'] = row.iloc[0]
info['LastName'] = row.iloc[-1]
info['MiddleName'] = ' '.join([(string or '') for string in row.iloc[1:-1]])
result.append(info)
return pd.DataFrame(result, index=df.index)
split_name(df, 'NAME_V2')
结果:
FirstName MiddleName LastName
0 John Smith
1 Smith
2 J O I Smith
您可以将其连接到原始数据帧。以下是我的方法:
def split_name(df, var):
sub_df = df[var].str.split('\\s+', expand=True)
result = []
for _, row in sub_df.iterrows():
info = {'FirstName': '', 'MiddleName': '', 'LastName': ''}
n = row.count()
if n == 0:
pass
elif n == 1:
info['LastName'] = row.iloc[0]
elif n == 2:
info['FirstName'], info['LastName'] = row.iloc[:2]
else:
info['FirstName'] = row.iloc[0]
info['LastName'] = row.iloc[-1]
info['MiddleName'] = ' '.join([(string or '') for string in row.iloc[1:-1]])
result.append(info)
return pd.DataFrame(result, index=df.index)
split_name(df, 'NAME_V2')
df.NAME_V2.str.extractall(r"\b(\w*)\s*(.*)\s*\b(\w+$)").fillna("").rename({0:"First_Name",1:"Middle_Name",2:"Last_Name"},axis=1)
Out[17]:
First_Name Middle_Name Last_Name
match
0 0 John Smith
1 0 Smith
2 0 J O I Smith
结果:
FirstName MiddleName LastName
0 John Smith
1 Smith
2 J O I Smith
您可以将其连接到原始数据帧。您可以发布演示数据帧和预期输出吗?
apply
返回一个包含3个元素的元组序列;它在LHS上需要一件事:删除元组。仅以a、b、c的形式返回,看看它是否有效。您可以发布演示数据帧和预期输出吗?apply
返回一个包含3个元素的元组序列;它在LHS上需要一件事:删除元组。只将其作为a、b、c返回,看看是否有效
df.NAME_V2.str.extractall(r"\b(\w*)\s*(.*)\s*\b(\w+$)").fillna("").rename({0:"First_Name",1:"Middle_Name",2:"Last_Name"},axis=1)
Out[17]:
First_Name Middle_Name Last_Name
match
0 0 John Smith
1 0 Smith
2 0 J O I Smith