ValueError:0不在python的列表中
我试图返回一个元组列表,该元组与问题的mosr相似候选者排序,并返回原始候选者列表中该候选者的索引: 我实现了这个功能:ValueError:0不在python的列表中,python,cosine-similarity,Python,Cosine Similarity,我试图返回一个元组列表,该元组与问题的mosr相似候选者排序,并返回原始候选者列表中该候选者的索引: 我实现了这个功能: from sklearn.metrics.pairwise import cosine_similarity def rank_candidates(question, candidates, embeddings, dim=300): """ question: a string candidates: a list of stri
from sklearn.metrics.pairwise import cosine_similarity
def rank_candidates(question, candidates, embeddings, dim=300):
"""
question: a string
candidates: a list of strings (candidates) which we want to rank
embeddings: some embeddings
dim: dimension of the current embeddings
result: a list of pairs (initial position in the list, question)
"""
cosi_dic={}
most_candidates=[]
q_vec=question_to_vec(question,embeddings,dim)
for i in candidates:
can_vec=question_to_vec(i,embeddings,dim)
cosi_dic[cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]]=i
for i in (list(reversed(sorted(cosi_dic.keys(),)))):
most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
return most_candidates
def question_to_vec(question, embeddings, dim=300):
"""
question: a string
embeddings: dict where the key is a word and a value is its' embedding
dim: size of the representation
result: vector representation for the question
"""
v=np.zeros(dim)
all_vectors=[]
question=question.split()
for i in question:
if i in embeddings:
all_vectors.append(embeddings[i])
if all_vectors:
v=np.mean(all_vectors, axis=0)
return v
函数question_to_vec
是一个函数,用于获取句子中嵌入向量的所有单词的平均值。这里是函数:
from sklearn.metrics.pairwise import cosine_similarity
def rank_candidates(question, candidates, embeddings, dim=300):
"""
question: a string
candidates: a list of strings (candidates) which we want to rank
embeddings: some embeddings
dim: dimension of the current embeddings
result: a list of pairs (initial position in the list, question)
"""
cosi_dic={}
most_candidates=[]
q_vec=question_to_vec(question,embeddings,dim)
for i in candidates:
can_vec=question_to_vec(i,embeddings,dim)
cosi_dic[cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]]=i
for i in (list(reversed(sorted(cosi_dic.keys(),)))):
most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
return most_candidates
def question_to_vec(question, embeddings, dim=300):
"""
question: a string
embeddings: dict where the key is a word and a value is its' embedding
dim: size of the representation
result: vector representation for the question
"""
v=np.zeros(dim)
all_vectors=[]
question=question.split()
for i in question:
if i in embeddings:
all_vectors.append(embeddings[i])
if all_vectors:
v=np.mean(all_vectors, axis=0)
return v
预期输出应该是这样的:[(2,c)、(0,b)、(1,a)],如果c与输入列表候选中的索引2最相似,而a是最不相似的。但是,当我尝试运行此代码时:
wv_ranking = []
for i in range(len(validation)):
line=validation[i]
q, *ex = line
ranks = rank_candidates(q, ex, wv_embeddings)
wv_ranking.append([r[0] for r in ranks].index(0) + 1)
其中是wv_嵌入件是GoogleNews-vectors-negative300的嵌入件,
我收到错误:ValueError:0不在列表中
我试图检查出现异常的行之间的余弦相似性,发现所有元素的值都为零?在深入研究错误后,发现在处理函数中的数据时使用字典会替换具有相同余弦相似值的值。因此,函数应如下所示:
def rank_candidates(question, candidates, embeddings, dim=300):
"""
question: a string
candidates: a list of strings (candidates) which we want to rank
embeddings: some embeddings
dim: dimension of the current embeddings
result: a list of pairs (initial position in the list, question)
"""
#cosi_dic={}
most_candidates=[]
updated_most_candidates=[]
q_vec=question_to_vec(question,wv_embeddings,300)
for i in candidates:
# print(type(i))
can_vec=question_to_vec(i,wv_embeddings,300)
#cosi_dic[cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]]=i
sim=cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]
#for i in (list(reversed(sorted(cosi_dic.keys(),)))):
#most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
most_candidates.append((sim,i))
most_candidates.sort(key=lambda x: x[0],reverse=True)
for i in most_candidates:
updated_most_candidates.append((candidates.index(i[1]),i[1]))
return updated_most_candidates
深入研究错误后,发现在处理函数中的数据时使用字典会替换具有相同余弦相似值的值。因此,函数应如下所示:
def rank_candidates(question, candidates, embeddings, dim=300):
"""
question: a string
candidates: a list of strings (candidates) which we want to rank
embeddings: some embeddings
dim: dimension of the current embeddings
result: a list of pairs (initial position in the list, question)
"""
#cosi_dic={}
most_candidates=[]
updated_most_candidates=[]
q_vec=question_to_vec(question,wv_embeddings,300)
for i in candidates:
# print(type(i))
can_vec=question_to_vec(i,wv_embeddings,300)
#cosi_dic[cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]]=i
sim=cosine_similarity(can_vec.reshape(1,-1), q_vec.reshape(1,-1))[0][0]
#for i in (list(reversed(sorted(cosi_dic.keys(),)))):
#most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
most_candidates.append((sim,i))
most_candidates.sort(key=lambda x: x[0],reverse=True)
for i in most_candidates:
updated_most_candidates.append((candidates.index(i[1]),i[1]))
return updated_most_candidates
如果没有看到所有使用函数的实现,就不可能解决这些问题,其中函数?提供一个最小的可重复示例,现在问题到向量,余弦相似性,验证,未定义wv_嵌入不可能在未看到所有已使用函数的实现情况下解决,其中函数?提供一个最小的可重复示例,目前未定义问题到向量、余弦相似性、验证、wv_嵌入