Python 如何找到熊猫类的相似性

Python 如何找到熊猫类的相似性,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我想找出数据集中每对组之间的相似性。我的数据如下:第一列是我的数据,第二列是类标签: import pandas as pd import numpy as np df = pd.DataFrame({'Data' : ["a1","a2","a3","a4","a5","a6","a7"], 'ClassLable' : ["c1","c2","c2","c2","c3","c3","c1"]}); df df2 = pd.DataFrame({'Data' : ["a1","a2",

我想找出数据集中每对组之间的相似性。我的数据如下:第一列是我的数据,第二列是类标签:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Data' : ["a1","a2","a3","a4","a5","a6","a7"], 'ClassLable' :     ["c1","c2","c2","c2","c3","c3","c1"]}); df
df2 = pd.DataFrame({'Data' : ["a1","a2","a4","a6","a7","a8","a9"], 'ClassLable' : ["c11","c21","c21","c12","c13","c13","c11"]}); df2
我想计算df和df2之间每对类标签的Jaccard索引。例如:

c1class = pd.DataFrame({'Data':["a1","a7"]})
c11class = pd.DataFrame({'Data':["a1","a9"]})
Jaccard = 1/3

换句话说,对于df1和df2,我想为每个类标签查找union上的相交项

from sklearn.metrics import jaccard_similarity_score

jaccard_similarity_score(df['Data'],df2['Data'])
Out[92]: 0.2857142857142857

jaccard_similarity_score(c1class, c11class)
Out[93]: 0.5

不,我想找出不同数据组之间的相似性。我想计算每对类的jaccard,以便根据相似的对象找到相似的类