Python 3.x 使用countvectorizer和余弦距离匹配来自不同集合的文档

Python 3.x 使用countvectorizer和余弦距离匹配来自不同集合的文档,python-3.x,dataset,cosine-similarity,countvectorizer,term-document-matrix,Python 3.x,Dataset,Cosine Similarity,Countvectorizer,Term Document Matrix,我试图实现来自不同数据集的文档之间的余弦距离。我对同一组文档之间的相似性不感兴趣。也就是说,我只对将文档中的文档与文档2中的类似文档进行匹配感兴趣 我尝试了以下操作,但它产生了此错误:ValueError:Found array with dim 3。检查所需的\u成对\u阵列 #Create list of documents to work with path = "path1" text_files = os.listdir(path) documents = [ope

我试图实现来自不同数据集的文档之间的余弦距离。我对同一组文档之间的相似性不感兴趣。也就是说,我只对将
文档
中的文档与
文档2
中的类似文档进行匹配感兴趣

我尝试了以下操作,但它产生了此错误:
ValueError:Found array with dim 3。检查所需的\u成对\u阵列
#Create list of documents to work with
path = "path1"
text_files = os.listdir(path)
documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]

vectorizer = CountVectorizer(max_df=29)
X = vectorizer.fit_transform(documents) #type sparse matrix
X = X.toarray()

#Create list of documents to work with
path2 = "path2"
text_files = os.listdir(path2)
documents2 = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]

vectorizer2 = CountVectorizer(max_df=29)
X2 = vectorizer.fit_transform(documents2) #type sparse matrix
X2 = X2.toarray()
#print(type(X))

cosine_similarity([X], [X2])
print(cosine_similarity)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0. 0. 1.]]