Python 3.x Scikit最近邻的TFIDF点列表
我可以为一个TFIDF运行kneighbors,但不能为它们的列表运行kneighbors 在详细介绍之前,我应该提到我这样做的原因,因为为每个数据点运行kneighbors需要很长时间,我想给kneighbors一个点列表将在内部进行优化 根据NN文档: 它说我可以查询多个点:Python 3.x Scikit最近邻的TFIDF点列表,python-3.x,scikit-learn,sparse-matrix,nearest-neighbor,Python 3.x,Scikit Learn,Sparse Matrix,Nearest Neighbor,我可以为一个TFIDF运行kneighbors,但不能为它们的列表运行kneighbors 在详细介绍之前,我应该提到我这样做的原因,因为为每个数据点运行kneighbors需要很长时间,我想给kneighbors一个点列表将在内部进行优化 根据NN文档: 它说我可以查询多个点: >>>X = [[0., 1., 0.], [1., 0., 1.]] >>>neigh.kneighbors(X, return_distance=False) >>
>>>X = [[0., 1., 0.], [1., 0., 1.]]
>>>neigh.kneighbors(X, return_distance=False)
>>>array([[1],
[2]]...)
我也试着这么做。
我可以分别为每个点运行kneighbors:
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1)
neigh.fit(X_train_tfidf)
ll=[]
test=["Test if this works","Zoom zoom"]
for k in test:
predict = count_vect.transform([k])
X_tfidf2 = tfidf_transformer.transform(predict)
ll.append(X_tfidf2)
res = neigh.kneighbors(X_tfidf2, return_distance=False)
#res = neigh.kneighbors(ll, return_distance=False)
当我将所有TFIDF稀疏矩阵添加到列表并尝试时,我得到一个错误。取消注释最后一行以获取错误
错误:
ValueError:使用序列设置数组元素(在res=neigh.kneighbors…行上)尝试:
from scipy import sparse
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1)
neigh.fit(X_train_tfidf)
ll=[]
test=["Test if this works","Zoom zoom"]
for k in test:
predict = count_vect.transform([k])
X_tfidf2 = tfidf_transformer.transform(predict)
ll.append(X_tfidf2)
ll = sparse.vstack((ll))
res = neigh.kneighbors(ll, return_distance=False)
没有循环:
从scipy导入稀疏
from sklearn.neighbors import NearestNeighbors
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
samples = ["This is a test","a very good test","some more text"]
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(samples)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
neigh = NearestNeighbors(n_neighbors=1, n_jobs=-1)
neigh.fit(X_train_tfidf)
test=["Test if this works","Zoom zoom"]
X_test_counts = count_vect.transform(test)
X_test_tfidf = tfidf_transformer.transform(X_test_counts)
res = neigh.kneighbors(X_test_tfidf, return_distance=False)
你能发布一些实际运行的示例代码吗?我用演示数据更新了文章,使其运行。当我尝试这样做时,我得到:ValueError:X和Y矩阵的不兼容维度。X.shape[1]==16而Y.shape[1]==8似乎有效:)测试结果以确保它们是相同的,并且将接受答案@兄弟姐妹你不需要使用循环顺便说一句。见更新的答案。真的!再次感谢。顺便说一句,由于所有邻居都是同时计算的,所以它的运行速度要快得多!