Python 是否用数据帧中的新值替换唯一值?
我有下面这样的dataframe,我想通过替换列的唯一值来降低它的敏感度。i、 e.我想用一些从“faker”库生成的假姓氏替换姓氏列 代码片段如下所示Python 是否用数据帧中的新值替换唯一值?,python,pandas,faker,Python,Pandas,Faker,我有下面这样的dataframe,我想通过替换列的唯一值来降低它的敏感度。i、 e.我想用一些从“faker”库生成的假姓氏替换姓氏列 代码片段如下所示 import pandas as pd from faker import Faker fake = Faker() print(fake.first_name()) print(fake.last_name()) last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair') job =
import pandas as pd
from faker import Faker
fake = Faker()
print(fake.first_name())
print(fake.last_name())
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
我想要的输出是用假名字更改姓氏列,但例如,Meyer应始终替换为相同的假姓氏。获得所有唯一的名字,创建一个带有映射唯一名称->假名字的字典,并将其映射到您的列:
import pandas as pd
first = ('Mike', 'Dorothee', 'Tom', 'Bill', 'Pete', 'Kate')
last = ('Meyer', 'Maier', 'Meyer', 'Mayer', 'Meyr', 'Mair')
job = ('data analyst', 'programmer', 'computer scientist',
'data scientist', 'accountant', 'psychiatrist')
language = ('Python', 'Perl', 'Java', 'Java', 'Cobol', 'Brainfuck')
df = pd.DataFrame(list(zip(last, job, language)),
columns =['last', 'job', 'language'],
index=first)
print(df)
# get all unique names - this can easily hande a couple tenthousand names
all_names = set(df["last"])
# create mapper: you would use fake.last_name() instead of 42+i
# mapper = {k: fake.last_name() for k in all_names }
mapper = {k: 42 + i for i, k in enumerate(all_names )}
# apply it
df["last"] = df["last"].map(mapper)
print(df)
输出:
# before
last job language
Mike Meyer data analyst Python
Dorothee Maier programmer Perl
Tom Meyer computer scientist Java
Bill Mayer data scientist Java
Pete Meyr accountant Cobol
Kate Mair psychiatrist Brainfuck
# after
last job language
Mike 44 data analyst Python
Dorothee 43 programmer Perl
Tom 44 computer scientist Java
Bill 45 data scientist Java
Pete 46 accountant Cobol
Kate 47 psychiatrist Brainfuck
获取所有唯一的名称,创建一个字典,映射唯一名称->假名称,pd.map到您的列
df.loc[df['last'].eq('Meyer'),'last']=fake.last_name()
感谢您的评论。但在我的真实数据集中,我有许多独特的姓氏,为所有独特的姓氏创建一个dict是不可能的。超过1000个。我想我知道怎么做,我现在就试试。