Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Bert标记器失败,出现“0”;ValueError:无法将NumPy数组转换为张量;_Python 3.x_Numpy_Tensorflow2.0_Tokenize_Bert Language Model - Fatal编程技术网

Python 3.x Bert标记器失败,出现“0”;ValueError:无法将NumPy数组转换为张量;

Python 3.x Bert标记器失败,出现“0”;ValueError:无法将NumPy数组转换为张量;,python-3.x,numpy,tensorflow2.0,tokenize,bert-language-model,Python 3.x,Numpy,Tensorflow2.0,Tokenize,Bert Language Model,我正在尝试使用Bert标记器为ner任务创建文本编码。我使用的是tensorflow==2.2.0和transformers==4.5.1。文本已拆分为单词。所以这是一个由单词分割的句子列表 train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2) train_texts[0][134:150] 返回 array(['TERMS', 'CONDITIONS', 'Dela

我正在尝试使用Bert标记器为ner任务创建文本编码。我使用的是
tensorflow==2.2.0
transformers==4.5.1
文本
已拆分为单词。所以这是一个由单词分割的句子列表

train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
train_texts[0][134:150]
返回

array(['TERMS', 'CONDITIONS', 'Delayed', 'payments', 'shall', 'be',
       'charged', 'interest', 'at', '24', 'p.a', 'from', 'duc', 'Goods',
       'Once', 'sold'], dtype=object)
然而,运行

tokenizer = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
由于错误而失败

ValueError:无法将NumPy数组转换为张量(不支持的对象类型float)

对于自定义数据集,我将遵循中提到的步骤。我尝试按照其他建议的解决方案升级和降级Tensorflow版本,但没有成功

完整的错误日志如下所述

All model checkpoint layers were used when initializing TFBertForTokenClassification.

Some layers of TFBertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-3a20ac537a69> in <module>()
      1 tokenizer = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
----> 2 train_encodings = tokenizer(list(train_texts), is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
      3 val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)

9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    817           return ops.convert_to_tensor_v2(x)
    818         return x
--> 819       inputs = nest.map_structure(_convert_non_tensor, inputs)
    820       input_list = nest.flatten(inputs)
    821 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
    615 
    616   return pack_sequence_as(
--> 617       structure[0], [func(*x) for x in entries],
    618       expand_composites=expand_composites)
    619 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in <listcomp>(.0)
    615 
    616   return pack_sequence_as(
--> 617       structure[0], [func(*x) for x in entries],
    618       expand_composites=expand_composites)
    619 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in _convert_non_tensor(x)
    815         # `SparseTensors` can't be converted to `Tensor`.
    816         if isinstance(x, (np.ndarray, float, int)):
--> 817           return ops.convert_to_tensor_v2(x)
    818         return x
    819       inputs = nest.map_structure(_convert_non_tensor, inputs)

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor_v2(value, dtype, dtype_hint, name)
   1281       name=name,
   1282       preferred_dtype=dtype_hint,
-> 1283       as_ref=False)
   1284 
   1285 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
   1339 
   1340     if ret is None:
-> 1341       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1342 
   1343     if ret is NotImplemented:

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
     50 def _default_conversion_function(value, dtype, name, as_ref):
     51   del as_ref  # Unused.
---> 52   return constant_op.constant(value, dtype, name=name)
     53 
     54 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
    261   return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 262                         allow_broadcast=True)
    263 
    264 

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    268   ctx = context.context()
    269   if ctx.executing_eagerly():
--> 270     t = convert_to_eager_tensor(value, ctx, dtype)
    271     if shape is None:
    272       return t

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
     94       dtype = dtypes.as_dtype(dtype).as_datatype_enum
     95   ctx.ensure_initialized()
---> 96   return ops.EagerTensor(value, ctx.device_name, dtype)
     97 
     98 

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
初始化TFBERTFORTOKENCLASSION时使用了所有模型检查点层。
TFBERTFORTOKENCLASSION的某些层没有从bert base的模型检查点初始化,而是新初始化:[“分类器”]
您可能应该在下游任务中训练此模型,以便能够将其用于预测和推断。
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
1标记器=TFBertForTokenClassification.from_pretrained('bert-base-uncased'))
---->2序列编码=标记器(列表(序列文本),被拆分为单词=真,返回偏移量=真,填充=真,截断=真)
3 val_encodings=tokenizer(val_text,is_split_为_words=True,return_offset_mapping=True,padding=True,truncation=True)
9帧
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base\u layer.py in uuuuu调用(self,*args,**kwargs)
817返回运算。将_转换为_张量_v2(x)
818返回x
-->819 inputs=nest.map\u结构(\u convert\u non\u张量,inputs)
820输入列表=嵌套。展平(输入)
821
/映射结构中的usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py(func,*structure,**kwargs)
615
616返回包顺序(
-->617结构[0],[func(*x)表示条目中的x],
618扩展_复合材料=扩展_复合材料)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in(.0)
615
616返回包顺序(
-->617结构[0],[func(*x)表示条目中的x],
618扩展_复合材料=扩展_复合材料)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base\u layer.py in\u convert\u non\u tensor(x)
815#'SparseTensors'不能转换为'Tensor'。
816如果isinstance(x,(np.ndarray,float,int)):
-->817返回运算。将_转换为_张量_v2(x)
818返回x
819 inputs=nest.map\u结构(\u convert\u non\u张量,inputs)
/convert_to_tensor_v2中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py(值、数据类型、数据类型提示、名称)
1281 name=名称,
1282首选类型=类型提示,
->1283 as_ref=False)
1284
1285
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(值、数据类型、名称、as_ref、首选数据类型、数据类型提示、ctx、接受的结果类型)
1339
1340如果ret为无:
->1341 ret=conversion\u func(值,dtype=dtype,name=name,as\u ref=as\u ref)
1342
1343如果未实施ret:
/函数中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor\u conversion\u registry.py(***解析参数失败***)
50 def默认转换函数(值、数据类型、名称,作为参考):
51 del as_ref#未使用。
--->52返回常量\运算常量(值,数据类型,名称=名称)
53
54
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in constant(值、数据类型、形状、名称)
261返回\u常量\u impl(值、数据类型、形状、名称、验证\u形状=False,
-->262允许_广播=真)
263
264
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant\u op.py in\u constant\u impl(值、数据类型、形状、名称、验证形状、允许广播)
268 ctx=context.context()
269如果ctx.executing_急切地()
-->270 t=转换为张量(值、ctx、数据类型)
271如果形状为“无”:
272返回t
/convert\u to\u eager\u tensor中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant\u op.py(值、ctx、dtype)
94 dtype=dtypes.as\u dtype(dtype).as\u datatype\u enum
95 ctx.确保_已初始化()
--->96返回运算符(值,ctx.device\u名称,数据类型)
97
98
ValueError:无法将NumPy数组转换为张量(不支持的对象类型float)。

我认为标记器需要一个带字符串的普通python列表。否则,您可以尝试为numpy数组强制转换数据类型:
train\U text=train\U text.astype('U')