Python Doc2Vec错误:至少需要一个数组来连接

Python Doc2Vec错误:至少需要一个数组来连接,python,doc2vec,Python,Doc2vec,我在尝试将doc2vec模型应用于某些文本时遇到错误。我下面的教程是。然而,我似乎无法在一些新的文本信息上“复制”结果 我读过关于这个问题的其他SO帖子,因为我有一个空列表,但我不知道为什么我有这个空列表 代码: 我遇到的错误与最后两行代码有关 ValueError:至少需要一个数组来连接 数据: 编辑: 迭代0 迭代100 迭代200 迭代300 迭代400 --------------------------------------------------------------------

我在尝试将
doc2vec
模型应用于某些文本时遇到错误。我下面的教程是。然而,我似乎无法在一些新的文本信息上“复制”结果

我读过关于这个问题的其他SO帖子,因为我有一个空列表,但我不知道为什么我有这个空列表

代码:

我遇到的错误与最后两行代码有关

ValueError:至少需要一个数组来连接

数据:

编辑:

迭代0
迭代100
迭代200
迭代300
迭代400
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
36
37 doc_tags=list(model.docvecs.doctags.keys())
--->38 X=型号[文件标签]
~\Anaconda3\lib\site packages\gensim\models\doc2vec.py in\uuuuu getitem\uuuuuu(self,tag)
961返回self.docvecs[标签]
962返回self.wv[标签]
-->963返回vstack([self[i]表示标记中的i])
964
965 def_uuustr_uuu(自我):
vstack(tup)中的~\Anaconda3\lib\site packages\numpy\core\shape\u base.py
232
233     """
-->234返回_nx.连接([tup中的_m至少为_2d(_m)],0)
235
236 def hstack(tup):
ValueError:至少需要一个数组来连接

请编辑问题标签以适合您的问题。我的问题是,当我尝试按照教程进行操作并将其应用于我自己的数据时,我遇到了一个错误。您应该显示完整的错误堆栈,它将突出显示抛出错误的代码的确切行,以及执行是如何到达的,以帮助回答者理解您的问题。但是sepa比率(不是错误的原因):该教程显示了一个糟糕的做法:修补默认的
alpha
值,并多次调用
train()
。它最终执行
max_epochs=500
循环,每个循环有(默认的
model.epochs
)5次数据传递:总共2500次。(在已发表的作品中,共有10-20个通行证。)从初始值<代码> 0.025 < <代码> > <代码> 0.0002 < /代码>完全<代码> 500 <代码>时间,alpha只会达到<代码> 0.015 <代码>。通常的默认值将平滑地将有效alpha从起始值滑到可忽略的值。我已经添加了完整的错误。谢谢您在第二个PAR上的观点。t、 当我把它应用到我自己的研究论文中时,我会记住这一点。
import pandas as pd

from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from nltk.tokenize import word_tokenize
import csv
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

list_id = list(df["id"])
list_def = list(df["text"])

tagged_data = [TaggedDocument(words=word_tokenize(term_def.lower()), tags=[list_id[i]]) for i, term_def in enumerate(list_def)]

max_epochs = 500
vec_size = 100
alpha = 0.025

model = Doc2Vec(vector_size=vec_size,
                alpha=alpha, 
                min_alpha=0.00025,
                min_count=1,
                dm=1)

model.build_vocab(tagged_data)

for epoch in range(max_epochs):
    if epoch % 100 == 0:
        print('iteration {0}'.format(epoch))

    model.train(tagged_data,
                total_examples=model.corpus_count,
                epochs=model.epochs)

    model.alpha -= 0.0002
    model.min_alpha = model.alpha

doc_tags = list(model.docvecs.doctags.keys())
X = model[doc_tags]
d = {'text': ["Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football is understood to refer to whichever form of football is the most popular in the regional context in which the word appears. Sports commonly called football in certain places include association football (known as soccer in some countries); gridiron football (specifically American football or Canadian football); Australian rules football; rugby football (either rugby league or rugby union); and Gaelic football.[1][2] These different variations of football are known as football codes.", "Rugby union, commonly known in most of the world simply as rugby,[3] is a contact team sport which originated in England in the first half of the 19th century.[4] One of the two codes of rugby football, it is based on running with the ball in hand. In its most common form, a game is between two teams of 15 players using an oval-shaped ball on a rectangular field with H-shaped goalposts at each end.", "Tennis is a racket sport that can be played individually against a single opponent (singles) or between two teams of two players each (doubles). Each player uses a tennis racket that is strung with cord to strike a hollow rubber ball covered with felt over or around a net and into the opponent's court. The object of the game is to maneuver the ball in such a way that the opponent is not able to play a valid return. The player who is unable to return the ball will not gain a point, while the opposite player will.", "Formula One (also Formula 1 or F1) is the highest class of single-seater auto racing sanctioned by the Fédération Internationale de l'Automobile (FIA) and owned by the Formula One Group. The FIA Formula One World Championship has been one of the premier forms of racing around the world since its inaugural season in 1950. The word formula in the name refers to the set of rules to which all participants' cars must conform.[1] A Formula One season consists of a series of races, known as Grands Prix (French for 'grand prizes' or 'great prizes'), which take place worldwide on purpose-built circuits and on public roads."], 'id': [123, 1234, 12345, 123456]}
df = pd.DataFrame(data=d)
iteration 0
iteration 100
iteration 200
iteration 300
iteration 400
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-45c9e3dc04ad> in <module>()
     36 
     37 doc_tags = list(model.docvecs.doctags.keys())
---> 38 X = model[doc_tags]

~\Anaconda3\lib\site-packages\gensim\models\doc2vec.py in __getitem__(self, tag)
    961                 return self.docvecs[tag]
    962             return self.wv[tag]
--> 963         return vstack([self[i] for i in tag])
    964 
    965     def __str__(self):

~\Anaconda3\lib\site-packages\numpy\core\shape_base.py in vstack(tup)
    232 
    233     """
--> 234     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    235 
    236 def hstack(tup):

ValueError: need at least one array to concatenate