Python ';utf-8';编解码器可以';t解码87字2VEC gensim位置的字节0xe3
我有密码Python ';utf-8';编解码器可以';t解码87字2VEC gensim位置的字节0xe3,python,gensim,word2vec,Python,Gensim,Word2vec,我有密码 import time import multiprocessing from datetime import timedelta from gensim.models import word2vec start_time = time.time() print('Training Word2Vec Model...') sentences = word2vec.LineSentence('data/data_text.txt') id_w2v = word2vec.Word2Vec(
import time
import multiprocessing
from datetime import timedelta
from gensim.models import word2vec
start_time = time.time()
print('Training Word2Vec Model...')
sentences = word2vec.LineSentence('data/data_text.txt')
id_w2v = word2vec.Word2Vec(sentences, size=300, workers=multiprocessing.cpu_count()-1)
id_w2v.save('model_terbaru/word2vec_300.model')
当我制作模型时,我有一个错误
Traceback (most recent call last):
File"<ipython-input-10-fc7016864a34>", line 1, in <module>
runfile('F:/pa reza/model.py', wdir='F:/pa reza')
File "C:\ProgramData\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 704, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "F:/pa reza/model.py", line 13, in <module>
iter=10)
帮帮我吧……你把回溯缩短了一点。如果您告诉我们以下哪一行导致了问题,而不仅仅是问题本身,您可能会得到更快的答案。我猜
句子
是ISO-8859-1
或ISO-8859-7
编码的或类似的东西。这意味着id\u w2v.save()
将无法执行'unicode(文本、编码、错误=错误)`文件data\u text.txt
是如何获取或创建的?(它可能有一些意外字符编码的数据,最好的修复方法可能是改进创建它的过程。)
File "C:\ProgramData\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 335, in __init__
self.build_vocab(sentences, trim_rule=trim_rule)
File "C:\ProgramData\Anaconda\lib\site-packages\gensim\models\base_any2vec.py", line 480, in build_vocab
sentences, progress_per=progress_per, trim_rule=trim_rule)
File "C:\ProgramData\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1151, in scan_vocab
for sentence_no, sentence in enumerate(sentences):
File "C:\ProgramData\Anaconda\lib\site-packages\gensim\models\word2vec.py", line 1073, in __iter__
line = utils.to_unicode(line).split()
File "C:\ProgramData\Anaconda\lib\site-packages\gensim\utils.py", line 359, in any2unicode
return unicode(text, encoding, errors=errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 87: invalid continuation byte