Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/xml/12.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Scikit最近邻的TFIDF点列表_Python 3.x_Scikit Learn_Sparse Matrix_Nearest Neighbor - Fatal编程技术网

Python 3.x Scikit最近邻的TFIDF点列表

Python 3.x Scikit最近邻的TFIDF点列表,python-3.x,scikit-learn,sparse-matrix,nearest-neighbor,Python 3.x,Scikit Learn,Sparse Matrix,Nearest Neighbor,我可以为一个TFIDF运行kneighbors,但不能为它们的列表运行kneighbors 在详细介绍之前,我应该提到我这样做的原因,因为为每个数据点运行kneighbors需要很长时间,我想给kneighbors一个点列表将在内部进行优化 根据NN文档: 它说我可以查询多个点: >>>X = [[0., 1., 0.], [1., 0., 1.]] >>>neigh.kneighbors(X, return_distance=False) >>

我可以为一个TFIDF运行kneighbors,但不能为它们的列表运行kneighbors

在详细介绍之前,我应该提到我这样做的原因,因为为每个数据点运行kneighbors需要很长时间,我想给kneighbors一个点列表将在内部进行优化

根据NN文档:

它说我可以查询多个点:

>>>X = [[0., 1., 0.], [1., 0., 1.]]
>>>neigh.kneighbors(X, return_distance=False) 
>>>array([[1],
   [2]]...)
我也试着这么做。 我可以分别为每个点运行kneighbors:

from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1) 
neigh.fit(X_train_tfidf)

ll=[]
test=["Test if this works","Zoom zoom"]
for k in test:
    predict = count_vect.transform([k])
    X_tfidf2 = tfidf_transformer.transform(predict)
    ll.append(X_tfidf2)
    res = neigh.kneighbors(X_tfidf2, return_distance=False)
#res = neigh.kneighbors(ll, return_distance=False)
当我将所有TFIDF稀疏矩阵添加到列表并尝试时,我得到一个错误。取消注释最后一行以获取错误

错误: ValueError:使用序列设置数组元素(在res=neigh.kneighbors…行上)

尝试:

from scipy import sparse

from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1) 
neigh.fit(X_train_tfidf)

ll=[]
test=["Test if this works","Zoom zoom"]
for k in test:
    predict = count_vect.transform([k])
    X_tfidf2 = tfidf_transformer.transform(predict)
    ll.append(X_tfidf2)

ll = sparse.vstack((ll))
res = neigh.kneighbors(ll, return_distance=False)
没有循环: 从scipy导入稀疏

from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer

samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1) 
neigh.fit(X_train_tfidf)

test=["Test if this works","Zoom zoom"]
X_test_counts = count_vect.transform(test)

X_test_tfidf = tfidf_transformer.transform(X_test_counts)

res = neigh.kneighbors(X_test_tfidf, return_distance=False)

你能发布一些实际运行的示例代码吗?我用演示数据更新了文章,使其运行。当我尝试这样做时,我得到:ValueError:X和Y矩阵的不兼容维度。X.shape[1]==16而Y.shape[1]==8似乎有效:)测试结果以确保它们是相同的,并且将接受答案@兄弟姐妹你不需要使用循环顺便说一句。见更新的答案。真的!再次感谢。顺便说一句,由于所有邻居都是同时计算的,所以它的运行速度要快得多!