Python 是否从_产品中创建多索引并附加列?
我有以下资料: 输入测向-Python 是否从_产品中创建多索引并附加列?,python,python-3.x,pandas,multi-index,Python,Python 3.x,Pandas,Multi Index,我有以下资料: 输入测向- fruit uniqueid apple 1123 appless 321 banana 623 mango 739 mangos 889 代码- df.loc[:,'fruit_copy'] = df['fruit'] ## comparing values from one column to each other compare = pd.MultiIndex.from_product([df['fruit'],df['fruit_copy']
fruit uniqueid
apple 1123
appless 321
banana 623
mango 739
mangos 889
代码-
df.loc[:,'fruit_copy'] = df['fruit']
## comparing values from one column to each other
compare = pd.MultiIndex.from_product([df['fruit'],df['fruit_copy']]).to_series()
def metrics(tup):
return pd.Series([fuzz.ratio(*tup),
fuzz.token_sort_ratio(*tup)],
['ratio', 'token'])
compare = compare.apply(metrics)
## only keep higher matches
compare_80 = compare[(compare['ratio'] >=80) & (compare['token'] >=80)]
电流输出-
ratio token
apple apple 100 100
appless 83 83
appless apple 83 83
appless 100 100
banana banana 100 100
mango mango 100 100
mangos 91 91
mangos mango 91 91
mangos 100 100
预期成果第一目标-
index1 index2 ratio token uniqueid
apple 1123 apple 100 100 1123
appless 83 83 321
appless 321 apple 83 83 1123
appless 100 100 321
banana 623 banana 100 100 632
mango 739 mango 100 100 739
mangos 91 91 889
mangos 889 mango 91 91 739
mangos 100 100 889
预期成果第二个目标-
index1 index2 ratio token uniqueid
apple 1123 appless 83 83 321
mango 739 mangos 91 91 889
我可以通过将uniqueid附加到多值索引中来实现这一点吗?您可以尝试通过交叉合并和稍后应用模糊比率来实现这一点:
s = df['fruit'].str[:2] #if you know how many start char should atleast match(assume 2)
u = df.assign(k=1,s=s).merge(df.drop('uniqueid',1).assign(k=1,s=s)
,on=['k','s'],suffixes=('','_y')).drop(['k','s'],1)
u = u[u['fruit'].ne(u['fruit_y'])].copy() #removing same combinations
u = (u.assign(Ratio=[fuzz.ratio(*i) for i in zip(u['fruit'],u['fruit_y'])])
.sort_values('Ratio',ascending=False).drop_duplicates('fruit')).sort_index()
out = (u[pd.DataFrame(np.sort(u[['fruit','fruit_y']],axis=1),index=u.index)
.duplicated(keep='last')])
是的,我正在使用&作为我的指导,请确认您的建议-添加的输入df
当前输出
与实际输出不匹配,请重新运行谢谢,我重新运行并更新了hh-ha!太好了,谢谢!
print(out)
fruit uniqueid fruit_y Ratio
1 apple 1123 appless 83
6 mango 739 mangos 91