Python 如何检查dataframe列的自定义函数中的特定字符串?
假设我得到了下一个dataframe列:Python 如何检查dataframe列的自定义函数中的特定字符串?,python,pandas,Python,Pandas,假设我得到了下一个dataframe列: import pandas as pd import string d = {'Name': ['Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss.Laina']} raw_df = pd.DataFrame(data=d) 我正在尝试解码此列,如果在字符串行中找到Mrs,则返回已结婚,否则未结婚:
import pandas as pd
import string
d = {'Name': ['Braund, Mr. Owen Harris','Cumings, Mrs. John Bradley (Florence Briggs Thayer)', 'Heikkinen, Miss.Laina']}
raw_df = pd.DataFrame(data=d)
我正在尝试解码此列,如果在字符串行中找到Mrs
,则返回已结婚
,否则未结婚
:
def is_married_female(raw_df):
raw_df['Name'].str.contains('Mrs').any():
return 'married'
else:
return 'not_married'
raw_df['is_married_female']=raw_df.apply(lambda x: is_married_female(x["Name"]), axis=1)
但是,我不断遇到下一个错误:
TypeError:字符串索引必须是整数
预期输出可能如下所示:
raw_df['is_married_female']
# not_married
# married
# not_married
我在函数中缺少了什么?问题:
x['Name']
是一种python,而不是一个系列或数据帧
函数中的is_marred_femal
变量raw_df
是一个字符串,如下所示:
“好极了,欧文·哈里斯先生”
运行raw\u df['Name']
时,这相当于:
print('Braund, Mr. Owen Harris'['Name']) # TypeError: string indices must be integers
它试图通过索引访问字符串,如
print('Braund, Mr. Owen Harris'[0]) # B
修正:
str
)并使用raw_df
重命名为name
,以避免将来的混淆然而,一个更有效的解决方案是使用:
两者的输出均为:
Name is_married_female
0 Braund, Mr. Owen Harris not_married
1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) married
2 Heikkinen, Miss.Laina not_married
import numpy as np
import pandas as pd
d = {'Name': ['Braund, Mr. Owen Harris',
'Cumings, Mrs. John Bradley (Florence Briggs Thayer)',
'Heikkinen, Miss.Laina']}
raw_df = pd.DataFrame(data=d)
raw_df['is_married_female'] = np.where(raw_df['Name'].str.contains('Mrs'),
'married', 'not_married')
print(raw_df.to_string())
Name is_married_female
0 Braund, Mr. Owen Harris not_married
1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) married
2 Heikkinen, Miss.Laina not_married