ValueError:0不在python的列表中_Python_Cosine Similarity

ValueError:0不在python的列表中

python

ValueError:0不在python的列表中,python,cosine-similarity,Python,Cosine Similarity,我试图返回一个元组列表，该元组与问题的mosr相似候选者排序，并返回原始候选者列表中该候选者的索引：我实现了这个功能： from sklearn.metrics.pairwise import cosine_similarity def rank_candidates(question, candidates, embeddings, dim=300): """ question: a string candidates: a list of stri

我试图返回一个元组列表，该元组与问题的mosr相似候选者排序，并返回原始候选者列表中该候选者的索引：我实现了这个功能：

from sklearn.metrics.pairwise import cosine_similarity

def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    cosi_dic={}
    most_candidates=[]
    q_vec=question_to_vec(question,embeddings,dim)
    for i in candidates:
      can_vec=question_to_vec(i,embeddings,dim)

      cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
    for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
    return most_candidates

def question_to_vec(question, embeddings, dim=300):
    """
        question: a string
        embeddings: dict where the key is a word and a value is its' embedding
        dim: size of the representation

        result: vector representation for the question
    """
    v=np.zeros(dim)
    all_vectors=[]
    question=question.split()
    for i in question:
      if i in embeddings:
        all_vectors.append(embeddings[i])
    if all_vectors:
      v=np.mean(all_vectors, axis=0)
    return v

函数

question_to_vec

是一个函数，用于获取句子中嵌入向量的所有单词的平均值。这里是函数：

from sklearn.metrics.pairwise import cosine_similarity

def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    cosi_dic={}
    most_candidates=[]
    q_vec=question_to_vec(question,embeddings,dim)
    for i in candidates:
      can_vec=question_to_vec(i,embeddings,dim)

      cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
    for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
    return most_candidates

def question_to_vec(question, embeddings, dim=300):
    """
        question: a string
        embeddings: dict where the key is a word and a value is its' embedding
        dim: size of the representation

        result: vector representation for the question
    """
    v=np.zeros(dim)
    all_vectors=[]
    question=question.split()
    for i in question:
      if i in embeddings:
        all_vectors.append(embeddings[i])
    if all_vectors:
      v=np.mean(all_vectors, axis=0)
    return v

预期输出应该是这样的：[（2，c）、（0，b）、（1，a）]，如果c与输入列表候选中的索引2最相似，而a是最不相似的。但是，当我尝试运行此代码时：

wv_ranking = []
for i in range(len(validation)):
    line=validation[i]
    q, *ex = line
    ranks = rank_candidates(q, ex, wv_embeddings)
    wv_ranking.append([r[0] for r in ranks].index(0) + 1)

其中是

wv_嵌入件是GoogleNews-vectors-negative300的嵌入件，
我收到错误：ValueError:0不在列表中
我试图检查出现异常的行之间的余弦相似性，发现所有元素的值都为零？
在深入研究错误后，发现在处理函数中的数据时使用字典会替换具有相同余弦相似值的值。因此，函数应如下所示：
def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    #cosi_dic={}
    most_candidates=[]
    updated_most_candidates=[]
    q_vec=question_to_vec(question,wv_embeddings,300)
    for i in candidates:
 # print(type(i))
      can_vec=question_to_vec(i,wv_embeddings,300)

      #cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
      sim=cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]
    #for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      #most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
      most_candidates.append((sim,i))
    most_candidates.sort(key=lambda x: x[0],reverse=True)
    for i in most_candidates:
      updated_most_candidates.append((candidates.index(i[1]),i[1]))


    return updated_most_candidates

深入研究错误后，发现在处理函数中的数据时使用字典会替换具有相同余弦相似值的值。因此，函数应如下所示：
def rank_candidates(question, candidates, embeddings, dim=300):
    """
        question: a string
        candidates: a list of strings (candidates) which we want to rank
        embeddings: some embeddings
        dim: dimension of the current embeddings

        result: a list of pairs (initial position in the list, question)
    """
    #cosi_dic={}
    most_candidates=[]
    updated_most_candidates=[]
    q_vec=question_to_vec(question,wv_embeddings,300)
    for i in candidates:
 # print(type(i))
      can_vec=question_to_vec(i,wv_embeddings,300)

      #cosi_dic[cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]]=i
      sim=cosine_similarity(can_vec.reshape(1,-1),  q_vec.reshape(1,-1))[0][0]
    #for i in (list(reversed(sorted(cosi_dic.keys(),)))):
      #most_candidates.append((candidates.index(cosi_dic[i]),cosi_dic[i]))
      most_candidates.append((sim,i))
    most_candidates.sort(key=lambda x: x[0],reverse=True)
    for i in most_candidates:
      updated_most_candidates.append((candidates.index(i[1]),i[1]))


    return updated_most_candidates

如果没有看到所有使用函数的实现，就不可能解决这些问题，其中函数？提供一个最小的可重复示例，现在问题到向量，余弦相似性，验证，未定义wv_嵌入不可能在未看到所有已使用函数的实现情况下解决，其中函数？提供一个最小的可重复示例，目前未定义问题到向量、余弦相似性、验证、wv_嵌入