Pandas 如何使用txt文件制作拼写更正器_Pandas_Dataframe

Pandas 如何使用txt文件制作拼写更正器

pandas dataframe

Pandas 如何使用txt文件制作拼写更正器,pandas,dataframe,Pandas,Dataframe,这是我的txt文件，名为replacer.txt keyword_origin, keyword_destinantion topu,topup atmstrbca,atm bca 这是我想要的 id keyword 1 transfer atmstrbca 2 topu bank 3 topup bank 我的预期产出 id keyword 1 transfer atm bca 2 topup bank 3 topup bank 我所做的是 df['keyword'].str

这是我的txt文件，名为

replacer.txt

keyword_origin, keyword_destinantion
topu,topup
atmstrbca,atm bca

这是我想要的

id keyword
1  transfer atmstrbca
2  topu bank
3  topup bank

我的预期产出

id keyword
1  transfer atm bca
2  topup bank
3  topup bank

我所做的是

df['keyword'].str.replace("atmstrbca","atm bca")
df['keyword'].str.replace("topu","topup")

输出是

id keyword
1  transfer atm bca
2  topup bank
3  topupp bank

我的想法是使用text

replacer.txt

来实现这一点，因为列表中包含更多的tahn 100关键字

从第一个文件创建字典，并按空格分割值，并使用

get

进行替换：

d = dict(zip(df1.keyword_origin, df1.keyword_destinantion))
#alternative
#d = df1.set_index('keyword_origin')['keyword_destinantion'].to_dict()
df2['keyword'] = df2['keyword'].apply(lambda x: ' '.join([d.get(y, y) for y in x.split()]))
print (df2)
   id           keyword
0   1  transfer atm bca
1   2        topup bank
2   3        topup bank

您可以使用

str.replace

替换为可调用的：

In [11]: d = {"atmstrbca": "atm bca", "topu": "topup"}  # all the typos

In [12]: regex = r'\b' + '|'.join(d.keys()) + r'\b'

In [13]: df['keyword'].str.replace(regex, lambda x: d[x.group()], regex=True)
Out[13]:
0    transfer atm bca
1          topup bank
2          topup bank
Name: keyword, dtype: object

您可以从其他数据帧生成dict，例如通过：

dict(zip(df_replacer.keyword_origin, df_replacer.keyword_destinantion))