Python 基于字符串替换列
我试图用一个新变量“Gender”替换列“Names”,该变量基于在列名称中找到的第一个字母 输入:Python 基于字符串替换列,python,pandas,Python,Pandas,我试图用一个新变量“Gender”替换列“Names”,该变量基于在列名称中找到的第一个字母 输入: df['Name'].value_counts() 输出: Mr. Gordon Hemmings 1 Miss Jane Wilkins 1 Mrs. Audrey North 1 Mrs. Wanda Sharp 1 Mr. Victor Hemmings 1 .. Miss Heather
df['Name'].value_counts()
输出:
Mr. Gordon Hemmings 1
Miss Jane Wilkins 1
Mrs. Audrey North 1
Mrs. Wanda Sharp 1
Mr. Victor Hemmings 1
..
Miss Heather Abraham 1
Mrs. Kylie Hart 1
Mr. Ian Langdon 1
Mr. Gordon Watson 1
Miss Irene Vance 1
Name: Name, Length: 4999, dtype: int64
现在,看到小姐了吗,夫人,小姐?我想到的第一个问题是:有多少不同的词
输入
现在我想:
#Replace missing value
df['Name'].fillna('Mr.', inplace=True)
# Create Column Gender
df['Gender'] = df['Name']
for i in range(0, df[0]):
A = df['Name'].values[i][0:3]=="Mr."
df['Gender'].values[i] = A
df.loc[df['Gender']==True, 'Gender']="Male"
df.loc[df['Gender']==False, 'Gender']="Female"
del df['Name'] #Delete column 'Name'
df
但由于出现以下错误,我遗漏了一些内容:
关键错误:0
KeyError
是因为您没有名为0
的列。然而,我会抛弃这些代码,尝试一些更有效的方法
在使用fillna()
后,您可以使用np.where
和str.contains
搜索带有Mr.
的姓名。然后,只需删除名称列:
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
df
完整示例:
df = pd.DataFrame({'Name': {0: 'Mr. Gordon Hemmings',
1: 'Miss Jane Wilkins',
2: 'Mrs. Audrey North',
3: 'Mrs. Wanda Sharp',
4: 'Mr. Victor Hemmings'},
'Value': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}})
print(df)
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
print('\n')
print(df)
Name Value
0 Mr. Gordon Hemmings 1
1 Miss Jane Wilkins 1
2 Mrs. Audrey North 1
3 Mrs. Wanda Sharp 1
4 Mr. Victor Hemmings 1
Value Gender
0 1 Male
1 1 Female
2 1 Female
3 1 Female
4 1 Male
那没用。。。我得到这个:df['Gender'].value_counts()男性4289女性711姓名:Gender,数据类型:int64但这是错误的。。。似乎他只是区分了Miss,他应该只返回“Mr.”true,否则返回False。@jps17183我忘记了
是正则表达式字符,所以你需要用/
来逃避它。
df = pd.DataFrame({'Name': {0: 'Mr. Gordon Hemmings',
1: 'Miss Jane Wilkins',
2: 'Mrs. Audrey North',
3: 'Mrs. Wanda Sharp',
4: 'Mr. Victor Hemmings'},
'Value': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}})
print(df)
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
print('\n')
print(df)
Name Value
0 Mr. Gordon Hemmings 1
1 Miss Jane Wilkins 1
2 Mrs. Audrey North 1
3 Mrs. Wanda Sharp 1
4 Mr. Victor Hemmings 1
Value Gender
0 1 Male
1 1 Female
2 1 Female
3 1 Female
4 1 Male