Python 查找两个列表之间的相似性。根据值在列表中的位置给出单独的权重
我试图找到一个列表与另一个列表的相似性值。比如找到一个句子的jaccard相似度值。但这里唯一的区别是,如果值在两个列表的同一个索引中,那么它得到的是一个静态权重,否则它的权重将根据它离该索引的位置的多少而惩罚Python 查找两个列表之间的相似性。根据值在列表中的位置给出单独的权重,python,Python,我试图找到一个列表与另一个列表的相似性值。比如找到一个句子的jaccard相似度值。但这里唯一的区别是,如果值在两个列表的同一个索引中,那么它得到的是一个静态权重,否则它的权重将根据它离该索引的位置的多少而惩罚 a=["are","you","are","you","why"] b=['you',"are","you",'are',"why"] li=[] va=[] fi=[] weightOfStatic=1/len(a) for i in range(len(a)): if
a=["are","you","are","you","why"]
b=['you',"are","you",'are',"why"]
li=[]
va=[]
fi=[]
weightOfStatic=1/len(a)
for i in range(len(a)):
if a[i]==b[i]:
print("true1", weightOfStatic,a[i],b[i])
fi.append({"static":i, "dynamic":i,"Weight":weightOfStatic})
li.append([weightOfStatic,a[i],b[i]])
va.append(li)
else:
for j in range(len(b)):
if a[i]==b[j]:
weightOfDynamic = weightOfStatic*(1-(1/len(b))*abs(i-j))
fi.append({"static":i, "dynamic":j,"Weight":weightOfDynamic})
print("true2 and index diiference between words =%d"% abs(i-j),weightOfDynamic, i,j)
li.append([weightOfDynamic,a[i],b[j]])
va.append(weightOfDynamic)
sim_value=sum(va)
print("The similarity value is = %f" %(sim_value))
以下代码在没有重复单词的情况下运行良好。比如a=[“你好”,“你在吗”,“你”] b=[“你”、“是”、“如何”]。 对于这个意义,它给出了0.5的相似性值 上述示例的预期结果将介于列表A和B之间。如果列表A中有重复的单词,则列表A中的值应取其在B中最近的索引。这就是在给定代码的情况下对aboe示例进行匹配的方式
{'static': 0, 'dynamic': 1, 'Weight': 0.160}
here 0 should not match with 3 again
{'static': 0, 'dynamic': 3, 'Weight': 0.079}
{'static': 1, 'dynamic': 0, 'Weight': 0.160}
same for 1 and 2
{'static': 1, 'dynamic': 2, 'Weight': 0.160}
dynamic 1 is already overhere
{'static': 2, 'dynamic': 1, 'Weight': 0.160}
{'static': 2, 'dynamic': 3, 'Weight': 0.160}
dynamic 0 is already over
{'static': 3, 'dynamic': 0, 'Weight': 0.079}
{'static': 3, 'dynamic': 2, 'Weight': 0.160}
[0.2, 'why', 'why']
此处的重量为1.3200(重量将从0到1)
结果应该是
{'static': 0, 'dynamic': 1, 'Weight': 0.160}
{'static': 1, 'dynamic': 0, 'Weight': 0.160}
{'static': 2, 'dynamic': 3, 'Weight': 0.160}
{'static': 3, 'dynamic': 2, 'Weight': 0.160}
[0.2, 'why', 'why']
总重量为0.84首先,我“美化”了您的代码,使其看起来更像蟒蛇我觉得你把事情复杂化了一点。事实上,它甚至没有为我运行,因为你试图对一个包含int和list的列表求和
a = ['are','you','are','you','why']
b = ['you','are','you','are','why']
total_weight = 0
weight_of_static = 1/len(a)
for i, a_word in enumerate(a):
if a_word == b[i]:
print('{0} <-> {1} => static\t\t// weight: {2:.2f}'.format(a_word, b[i], weight_of_static))
total_weight += weight_of_static
else:
distances = []
for j, b_word in enumerate(b):
if a_word == b_word:
distances.append(abs(i - j))
dynamic_weight = weight_of_static*(1 - ( 1 / len(b)) * min(distances))
total_weight += dynamic_weight
print('{0} <-> {1} => not static\t// weight: {2:.2f}'.format(a_word, b[i], dynamic_weight))
print('The similarity value is = {0:.2f}'.format(total_weight))
我希望这能有所帮助。嘿,非常感谢。。!
$ python similarity.py
are <-> you => not static // weight: 0.16
you <-> are => not static // weight: 0.16
are <-> you => not static // weight: 0.16
you <-> are => not static // weight: 0.16
why <-> why => static // weight: 0.20
The similarity value is = 0.84