Python 基于字符串替换列

Python 基于字符串替换列,python,pandas,Python,Pandas,我试图用一个新变量“Gender”替换列“Names”,该变量基于在列名称中找到的第一个字母 输入: df['Name'].value_counts() 输出: Mr. Gordon Hemmings 1 Miss Jane Wilkins 1 Mrs. Audrey North 1 Mrs. Wanda Sharp 1 Mr. Victor Hemmings 1 .. Miss Heather

我试图用一个新变量“Gender”替换列“Names”,该变量基于在列名称中找到的第一个字母

输入:

df['Name'].value_counts()
输出:

Mr. Gordon Hemmings     1
Miss Jane Wilkins       1
Mrs. Audrey North       1
Mrs. Wanda Sharp        1
Mr. Victor Hemmings     1
                       ..
Miss Heather Abraham    1
Mrs. Kylie Hart         1
Mr. Ian Langdon         1
Mr. Gordon Watson       1
Miss Irene Vance        1

Name: Name, Length: 4999, dtype: int64
现在,看到小姐了吗,夫人,小姐?我想到的第一个问题是:有多少不同的词

输入

现在我想:

    #Replace missing value

df['Name'].fillna('Mr.', inplace=True)

# Create Column Gender
df['Gender'] = df['Name']

for i in range(0, df[0]):  


    A = df['Name'].values[i][0:3]=="Mr." 
    df['Gender'].values[i] = A

df.loc[df['Gender']==True, 'Gender']="Male"
df.loc[df['Gender']==False, 'Gender']="Female"

del df['Name'] #Delete column 'Name'

df
但由于出现以下错误,我遗漏了一些内容:

关键错误:0


KeyError
是因为您没有名为
0
的列。然而,我会抛弃这些代码,尝试一些更有效的方法

在使用
fillna()
后,您可以使用
np.where
str.contains
搜索带有
Mr.
的姓名。然后,只需
删除
名称
列:

df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
df
完整示例:

df = pd.DataFrame({'Name': {0: 'Mr. Gordon Hemmings',
  1: 'Miss Jane Wilkins',
  2: 'Mrs. Audrey North',
  3: 'Mrs. Wanda Sharp',
  4: 'Mr. Victor Hemmings'},
 'Value': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}})
print(df)
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
print('\n')
print(df)
                  Name  Value
0  Mr. Gordon Hemmings      1
1    Miss Jane Wilkins      1
2    Mrs. Audrey North      1
3     Mrs. Wanda Sharp      1
4  Mr. Victor Hemmings      1


   Value  Gender
0      1    Male
1      1  Female
2      1  Female
3      1  Female
4      1    Male

那没用。。。我得到这个:df['Gender'].value_counts()男性4289女性711姓名:Gender,数据类型:int64但这是错误的。。。似乎他只是区分了Miss,他应该只返回“Mr.”true,否则返回False。@jps17183我忘记了
是正则表达式字符,所以你需要用
/
来逃避它。
df = pd.DataFrame({'Name': {0: 'Mr. Gordon Hemmings',
  1: 'Miss Jane Wilkins',
  2: 'Mrs. Audrey North',
  3: 'Mrs. Wanda Sharp',
  4: 'Mr. Victor Hemmings'},
 'Value': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}})
print(df)
df['Name'] = df['Name'].fillna('Mr.')
df['Gender'] = np.where(df['Name'].str.contains('Mr\.'), 'Male', 'Female')
df = df.drop('Name', axis=1)
print('\n')
print(df)
                  Name  Value
0  Mr. Gordon Hemmings      1
1    Miss Jane Wilkins      1
2    Mrs. Audrey North      1
3     Mrs. Wanda Sharp      1
4  Mr. Victor Hemmings      1


   Value  Gender
0      1    Male
1      1  Female
2      1  Female
3      1  Female
4      1    Male