Python 是否用数据帧中的新值替换唯一值?

Python 是否用数据帧中的新值替换唯一值?,python,pandas,faker,Python,Pandas,Faker,我有下面这样的dataframe,我想通过替换列的唯一值来降低它的敏感度。i、 e.我想用一些从“faker”库生成的假姓氏替换姓氏列 代码片段如下所示 import pandas as pd from faker import Faker fake = Faker() print(fake.first_name()) print(fake.last_name()) last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair') job =

我有下面这样的dataframe,我想通过替换列的唯一值来降低它的敏感度。i、 e.我想用一些从“faker”库生成的假姓氏替换姓氏列

代码片段如下所示

import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
       'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 

我想要的输出是用假名字更改姓氏列,但例如,Meyer应始终替换为相同的假姓氏。

获得所有唯一的名字,创建一个带有映射唯一名称->假名字的字典,并将其映射到您的列:

import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist', 
      'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')

df = pd.DataFrame(list(zip(last, job, language)), 
                  columns =['last', 'job', 'language'],
                  index=first) 
print(df)

# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])

# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}

# apply it
df["last"] = df["last"].map(mapper)
print(df)
输出:

# before
          last                 job   language
Mike      Meyer        data analyst     Python
Dorothee  Maier          programmer       Perl
Tom       Meyer  computer scientist       Java
Bill      Mayer      data scientist       Java
Pete       Meyr          accountant      Cobol
Kate       Mair        psychiatrist  Brainfuck

# after
          last                 job   language
Mike        44        data analyst     Python
Dorothee    43          programmer       Perl
Tom         44  computer scientist       Java
Bill        45      data scientist       Java
Pete        46          accountant      Cobol
Kate        47        psychiatrist  Brainfuck

获取所有唯一的名称,创建一个字典,映射唯一名称->假名称,pd.map到您的列
df.loc[df['last'].eq('Meyer'),'last']=fake.last_name()
感谢您的评论。但在我的真实数据集中,我有许多独特的姓氏,为所有独特的姓氏创建一个dict是不可能的。超过1000个。我想我知道怎么做,我现在就试试。