Python 使用SentenceTransformer的这个错误(utf-8)是什么意思?

Python 使用SentenceTransformer的这个错误(utf-8)是什么意思?,python,sentence-similarity,Python,Sentence Similarity,几周来,我一直在用BERT句子转换器编写代码。不知从哪里它开始产生一个错误。我将代码缩减为一行,这会导致错误: from sentence_transformers import SentenceTransformer 完整的错误信息如下所示: runfile('C:/Users/ga2943/Gregor Schweitzer - Masterthesis Adrian/Code/FSzuBR_bert_distil_Multicluster.py', wdir='C:/Users/ga29

几周来,我一直在用BERT句子转换器编写代码。不知从哪里它开始产生一个错误。我将代码缩减为一行,这会导致错误:

from sentence_transformers import SentenceTransformer
完整的错误信息如下所示:

runfile('C:/Users/ga2943/Gregor Schweitzer - Masterthesis Adrian/Code/FSzuBR_bert_distil_Multicluster.py', wdir='C:/Users/ga2943/Gregor Schweitzer - Masterthesis Adrian/Code')
Traceback (most recent call last):

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3343, in run_code
    self.showtraceback(running_compiled_code=True)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2026, in showtraceback
    self.showsyntaxerror(filename, running_compiled_code)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2088, in showsyntaxerror
    stb = self.SyntaxTB.structured_traceback(etype, value, elist)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\IPython\core\ultratb.py", line 1420, in structured_traceback
    newtext = linecache.getline(value.filename, value.lineno)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\linecache.py", line 16, in getline
    lines = getlines(filename, module_globals)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\linecache.py", line 47, in getlines
    return updatecache(filename, module_globals)

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\linecache.py", line 137, in updatecache
    lines = fp.readlines()

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1904: invalid continuation byte
有时它会产生不同的错误信息:

Traceback (most recent call last):

  File "<ipython-input-23-dbcd88385343>", line 1, in <module>
    from sentence_transformers import SentenceTransformer

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\sentence_transformers\__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, SentenceLabelDataset

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\sentence_transformers\datasets.py", line 5, in <module>
    from torch.utils.data import Dataset

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\__init__.py", line 1, in <module>
    from .sampler import Sampler, SequentialSampler, RandomSampler, SubsetRandomSampler, WeightedRandomSampler, BatchSampler

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\utils\data\sampler.py", line 1, in <module>
    import torch

  File "C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\__init__.py", line 83, in <module>
    __all__ += [name for name in dir(_C)

NameError: name '_C' is not defined
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
从句子_transformers导入句子transformer
文件“C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site packages\句子\u transformers\\uuuuu init\uuuuuuuuuuuuuuuu.py”,第3行,在
从.dataset导入SentencesDataset,SentenceLabelDataset
文件“C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site packages\句子\u transformers\dataset.py”,第5行,在
从torch.utils.data导入数据集
文件“C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site packages\torch\utils\data\\uuuu init\uuu.py”,第1行,在
从.采样器导入采样器,顺序采样器,随机采样器,海底采样器,加权随机采样器,批次采样器
文件“C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site packages\torch\utils\data\sampler.py”,第1行,在
进口火炬
文件“C:\Users\ga2943\AppData\Local\Continuum\anaconda3\lib\site packages\torch\\uuuuu init\uuuuu.py”,第83行,在
__all_uuu+=[目录中名称的名称(_C)
NameError:未定义名称“\u C”
我没有删除库或任何东西。有人知道如何解决我的错误吗? 在后台会自动发生什么

另外一些事实:

  • 同样的代码在colab上也能工作,所以库本身不能工作 坏了
  • 它也停止在我的一个同事的计算机上工作,我们正在共享代码所在的同一文件夹(即err msg中的dir)

  • 你能试试其他编码标准吗?比如拉丁语-1?@Justice_Lords我该怎么做?为什么utf-8会突然停止工作?模块可能会更新?可能会创建一个virtualenv,然后安装其中的所有内容并运行程序。