Python 匹配具有相似但不精确匹配的数据行的两个数据库中的名称
我尝试了difflib和fuzzy wuzzy来匹配这个问题中的名称,但是由于名称的变化,匹配率很低。我现在正试图使用我拥有的其他数据字段作为名称,但完全不确定如何处理这样的问题。如果我不清楚,请让我知道,我会尽力澄清 我有两个数据框,它们保存着相似但不精确匹配的人的信息。我希望将每个数据帧的参考号与另一个数据帧的参考号进行匹配,每个人的参考号都是唯一的。举例来说,在下表中,我想知道Jimmy/James Random是同一个人,但在第一个数据帧中不匹配的名称参考号在DF1中是1234,在DF2中是89。请注意,一个人的排名可能会发生变化,但会同时在两个表中发生变化。每个人的参考号、样式、ID和国籍将始终保持不变Python 匹配具有相似但不精确匹配的数据行的两个数据库中的名称,python,pandas,Python,Pandas,我尝试了difflib和fuzzy wuzzy来匹配这个问题中的名称,但是由于名称的变化,匹配率很低。我现在正试图使用我拥有的其他数据字段作为名称,但完全不确定如何处理这样的问题。如果我不清楚,请让我知道,我会尽力澄清 我有两个数据框,它们保存着相似但不精确匹配的人的信息。我希望将每个数据帧的参考号与另一个数据帧的参考号进行匹配,每个人的参考号都是唯一的。举例来说,在下表中,我想知道Jimmy/James Random是同一个人,但在第一个数据帧中不匹配的名称参考号在DF1中是1234,在DF2
df1 = pd.DataFrame(columns=["Ref","Date","Name", "Rank","Nationality","Style","ID"], \
data=[["1234","20200104","Jimmy Random","General","France","Aggressive",""],\
["1333","20200104","Ian Fleming","Brigadier","England","Passive","14"],\
["1234","20191204","Jimmy Random","Major","France","","15"],\
["1000","20200404","Peter Nisbett","Corporal","","Passive","12"]])
df2 = pd.DataFrame(columns=["Ref","Date","Name", "Rank","Nationality","Style","ID"], \
data=[["89","20200104","James Random","","France","Aggressive","104"],\
["10","20200104","I. Fleming","Brigadier","England","","4"],\
["156","20200404","P. Nisbett","","Spain","Passive","5"],\
["89","20191204","James Random","Major","France","Aggressive","104"]])
提前非常感谢您提供的任何帮助
奶酪汉堡您基本上需要将字符串与其他分析进行比较,对吗?检查余弦相似性,它在scit learn中实现。请在问题文本中提供示例数据,而不是图片或链接,以便将代码、数据和错误消息作为文本。Python无法读取图像来运行代码。明白了,在文本formI中添加代码,我现在就去看看,谢谢你的提示!!