Python word2vec获取编码错误
我在执行代码时遇到以下错误Python word2vec获取编码错误,python,word2vec,Python,Word2vec,我在执行代码时遇到以下错误 Traceback (most recent call last): File "test.py", line 21, in <module> print model.most_similar(positive=['男人']) File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 660, in most_similar raise
Traceback (most recent call last):
File "test.py", line 21, in <module>
print model.most_similar(positive=['男人'])
File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 660, in most_similar
raise KeyError("word '%s' not in vocabulary" % word)
KeyError: "word '\xe7\x94\xb7\xe4\xba\xba' not in vocabulary"
“它通过以下更改工作。model.most_类似([u'男人'])" 这意味着您可能正在使用utf-8编码字符串而不是unicode字符串,一个好的做法是使用unicode对输入进行解码,然后对输出进行编码
.decode('utf-8')
您的字符串您是否注意您的解码、编码?它通过以下更改工作。型号。大多数类似([u'男人'])
# -*- coding: utf8 -*
from gensim.models import word2vec
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
sentences = word2vec.Text8Corpus('/tmp/text8')
model = word2vec.
Word2Vec(sentences, size=200)
model.most_similar(['男人'])