Python Word2vec模型尺寸非常小，无法识别单词_Python_Python 3.x_Word2vec_Gensim_Word Embedding

Python Word2vec模型尺寸非常小，无法识别单词

python python-3.x

Python Word2vec模型尺寸非常小，无法识别单词,python,python-3.x,word2vec,gensim,word-embedding,Python,Python 3.x,Word2vec,Gensim,Word Embedding,我使用gensim软件包对word2vec模型进行了6.4 GB文本数据的培训，该数据使用以下代码片段进行了预处理： def read_input(input_file): with open(input_file, "r") as inp: inp_str = inp.read() inp_str = inp_str.strip('\n') inp_str = re.sub(r'\n', ' ', inp_str) lowercase = inp_str.lower(

我使用

gensim

软件包对

word2vec

模型进行了

6.4 GB文本数据的培训，该数据使用以下代码片段进行了预处理：
def read_input(input_file):
  with open(input_file, "r") as inp:
    inp_str = inp.read()

  inp_str = inp_str.strip('\n')
  inp_str = re.sub(r'\n', ' ', inp_str)
  lowercase = inp_str.lower()
  punc = lowercase.translate(str.maketrans('', '', string.punctuation))

  return (punc.translate(str.maketrans('','','1234567890')))

def read_(input_file):
  return( gensim.utils.simple_preprocess(input_file, deacc=True, min_len=3))          

doc = read_input('../train1.txt')
documents = read_(doc)
logging.info ("Done reading data file")

但每次我训练模型时，它的大小都是147KB，这似乎不正确，当我尝试从训练过的模型生成向量时，它说：
KeyError: "word 'type' not in vocabulary"

以下是我用于培训word2vec模型的代码：
old_model = Word2Vec.load('../word2vec_model')
old_model.train(documents, total_examples=old_model.corpus_count, epochs=7)

old_model.save('../word2vec_model1')

logging.info ("Saved the new word2vec model")

请帮助我解决此问题。
大小较小，可能是因为在设置模型参数时，可能使用了较大的值作为最小计数参数。
尝试减少min_count中的值并重新训练模型