Python ValueError:qk和pk必须具有相同的形状-scipy.spatial.distance.jensenshannon_Python_Scipy_Topic Modeling

Python ValueError:qk和pk必须具有相同的形状-scipy.spatial.distance.jensenshannon

python

Python ValueError:qk和pk必须具有相同的形状-scipy.spatial.distance.jensenshannon,python,scipy,topic-modeling,Python,Scipy,Topic Modeling,我正在调用下面的jensen_shannon（查询，矩阵）函数，以在文档矩阵中查找文档查询中最相似的文档 def jensen_shannon(query, matrix): """ This function implements a Jensen-Shannon similarity between the input query (an LDA topic distribution for a document) and the entire corpus of topic distrib

我正在调用下面的jensen_shannon（查询，矩阵）函数，以在文档矩阵中查找文档查询中最相似的文档

def jensen_shannon(query, matrix):
"""
This function implements a Jensen-Shannon similarity
between the input query (an LDA topic distribution for a document)
and the entire corpus of topic distributions.
It returns an array of length M where M is the number of documents in the corpus
"""
# lets keep with the p,q notation above
p = query[None,:].T # take transpose
q = matrix.T # transpose matrix
m = 0.5*(p + q)
return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))

查询形状：（100，）

矩阵的形状：（10804100）

错误回溯：

ValueError                                Traceback (most recent call last)
<ipython-input-103-86cb68dd862d> in <module>
      1 # this is surprisingly fast
----> 2 most_sim_ids = get_most_similar_documents(new_doc_distribution,doc_topic_dist)

<ipython-input-102-c0fb95224e87> in get_most_similar_documents(query, matrix, k)
      6     print(query.shape)
      7     print(matrix.shape)
----> 8     sims = jensen_shannon(query,matrix) # list of jensen shannon distances
      9     return sims.argsort()[:k] # the top k positional index of the smallest Jensen Shannon distances

<ipython-input-74-6ffb0ec54e9a> in jensen_shannon(query, matrix)
     10     q = matrix.T # transpose matrix
     11     m = 0.5*(p + q)
---> 12     return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))

~/venv/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py in entropy(pk, qk, base, axis)
   2668         qk = asarray(qk)
   2669         if qk.shape != pk.shape:
-> 2670             raise ValueError("qk and pk must have same shape.")
   2671         qk = 1.0*qk / np.sum(qk, axis=axis, keepdims=True)
   2672         vec = rel_entr(pk, qk)

ValueError: qk and pk must have same shape.

ValueError回溯（最近一次调用）
在里面
1#速度惊人
---->2个most sim ID=获取最相似的文档（新文档分发、文档主题分发）
在获取最相似的文档中（查询、矩阵、k）
6打印（查询.形状）
7打印（矩阵形状）
---->8 sims=jensen_shannon（查询，矩阵）#jensen-shannon距离列表
9 return sims.argsort（）[：k]#最小Jensen-Shannon距离的前k个位置索引
在jensen_shannon（查询，矩阵）
10 q=矩阵。T#转置矩阵
11米=0.5*（p+q）
--->12返回np.sqrt（0.5*（熵（p，m）+熵（q，m）））
熵中的~/venv/lib/python3.6/site-packages/scipy/stats//u distn\u infrastructure.py（pk、qk、base、axis）
2668 qk=阵列（qk）
2669如果qk.shape！=pk.shape：
->2670提升值错误（“qk和pk必须具有相同的形状。”）
2671 qk=1.0*qk/np.和（qk，axis=axis，keepdims=True）
2672向量=相对熵（pk，qk）
ValueError:qk和pk必须具有相同的形状。

但它不接受函数中的轴参数

有人知道我错过了什么吗？非常感谢任何潜在客户。谢谢

仅供参考：我正在尝试这个kaggle代码

我自己也遇到了这个问题。scipy 1.3.0版的性能仍与您使用的Jensen-Shannon公式所预期的一样。

尝试以下方法：

p=query[None，：].T+np.zero（[10010804]）

100=主题数量

10804=文档数量

虽然此代码可以回答问题，但提供有关如何和/或为什么解决问题的附加上下文将提高答案的长期价值。请阅读，然后

def jensen_shannon(query, matrix):
"""
This function implements a Jensen-Shannon similarity
between the input query (an LDA topic distribution for a document)
and the entire corpus of topic distributions.
It returns an array of length M where M is the number of documents in the corpus
"""
# lets keep with the p,q notation above
p = query[None,:].T # take transpose
q = matrix.T # transpose matrix
m = 0.5*(p + q)
return np.sqrt(0.5*(entropy(p,m) + entropy(q,m)))