Python 3.x Bert标记器失败,出现“0”;ValueError:无法将NumPy数组转换为张量;
我正在尝试使用Bert标记器为ner任务创建文本编码。我使用的是Python 3.x Bert标记器失败,出现“0”;ValueError:无法将NumPy数组转换为张量;,python-3.x,numpy,tensorflow2.0,tokenize,bert-language-model,Python 3.x,Numpy,Tensorflow2.0,Tokenize,Bert Language Model,我正在尝试使用Bert标记器为ner任务创建文本编码。我使用的是tensorflow==2.2.0和transformers==4.5.1。文本已拆分为单词。所以这是一个由单词分割的句子列表 train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2) train_texts[0][134:150] 返回 array(['TERMS', 'CONDITIONS', 'Dela
tensorflow==2.2.0
和transformers==4.5.1
。文本
已拆分为单词。所以这是一个由单词分割的句子列表
train_texts, val_texts, train_tags, val_tags = train_test_split(texts, tags, test_size=.2)
train_texts[0][134:150]
返回
array(['TERMS', 'CONDITIONS', 'Delayed', 'payments', 'shall', 'be',
'charged', 'interest', 'at', '24', 'p.a', 'from', 'duc', 'Goods',
'Once', 'sold'], dtype=object)
然而,运行
tokenizer = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
train_encodings = tokenizer(train_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
由于错误而失败
ValueError:无法将NumPy数组转换为张量(不支持的对象类型float)
对于自定义数据集,我将遵循中提到的步骤。我尝试按照其他建议的解决方案升级和降级Tensorflow版本,但没有成功
完整的错误日志如下所述
All model checkpoint layers were used when initializing TFBertForTokenClassification.
Some layers of TFBertForTokenClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-3a20ac537a69> in <module>()
1 tokenizer = TFBertForTokenClassification.from_pretrained('bert-base-uncased')
----> 2 train_encodings = tokenizer(list(train_texts), is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
3 val_encodings = tokenizer(val_texts, is_split_into_words=True, return_offsets_mapping=True, padding=True, truncation=True)
9 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
817 return ops.convert_to_tensor_v2(x)
818 return x
--> 819 inputs = nest.map_structure(_convert_non_tensor, inputs)
820 input_list = nest.flatten(inputs)
821
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
615
616 return pack_sequence_as(
--> 617 structure[0], [func(*x) for x in entries],
618 expand_composites=expand_composites)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in <listcomp>(.0)
615
616 return pack_sequence_as(
--> 617 structure[0], [func(*x) for x in entries],
618 expand_composites=expand_composites)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer.py in _convert_non_tensor(x)
815 # `SparseTensors` can't be converted to `Tensor`.
816 if isinstance(x, (np.ndarray, float, int)):
--> 817 return ops.convert_to_tensor_v2(x)
818 return x
819 inputs = nest.map_structure(_convert_non_tensor, inputs)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor_v2(value, dtype, dtype_hint, name)
1281 name=name,
1282 preferred_dtype=dtype_hint,
-> 1283 as_ref=False)
1284
1285
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
1339
1340 if ret is None:
-> 1341 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1342
1343 if ret is NotImplemented:
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py in _default_conversion_function(***failed resolving arguments***)
50 def _default_conversion_function(value, dtype, name, as_ref):
51 del as_ref # Unused.
---> 52 return constant_op.constant(value, dtype, name=name)
53
54
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in constant(value, dtype, shape, name)
261 return _constant_impl(value, dtype, shape, name, verify_shape=False,
--> 262 allow_broadcast=True)
263
264
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
268 ctx = context.context()
269 if ctx.executing_eagerly():
--> 270 t = convert_to_eager_tensor(value, ctx, dtype)
271 if shape is None:
272 return t
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
94 dtype = dtypes.as_dtype(dtype).as_datatype_enum
95 ctx.ensure_initialized()
---> 96 return ops.EagerTensor(value, ctx.device_name, dtype)
97
98
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
初始化TFBERTFORTOKENCLASSION时使用了所有模型检查点层。
TFBERTFORTOKENCLASSION的某些层没有从bert base的模型检查点初始化,而是新初始化:[“分类器”]
您可能应该在下游任务中训练此模型,以便能够将其用于预测和推断。
---------------------------------------------------------------------------
ValueError回溯(最近一次调用上次)
在()
1标记器=TFBertForTokenClassification.from_pretrained('bert-base-uncased'))
---->2序列编码=标记器(列表(序列文本),被拆分为单词=真,返回偏移量=真,填充=真,截断=真)
3 val_encodings=tokenizer(val_text,is_split_为_words=True,return_offset_mapping=True,padding=True,truncation=True)
9帧
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base\u layer.py in uuuuu调用(self,*args,**kwargs)
817返回运算。将_转换为_张量_v2(x)
818返回x
-->819 inputs=nest.map\u结构(\u convert\u non\u张量,inputs)
820输入列表=嵌套。展平(输入)
821
/映射结构中的usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py(func,*structure,**kwargs)
615
616返回包顺序(
-->617结构[0],[func(*x)表示条目中的x],
618扩展_复合材料=扩展_复合材料)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in(.0)
615
616返回包顺序(
-->617结构[0],[func(*x)表示条目中的x],
618扩展_复合材料=扩展_复合材料)
619
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base\u layer.py in\u convert\u non\u tensor(x)
815#'SparseTensors'不能转换为'Tensor'。
816如果isinstance(x,(np.ndarray,float,int)):
-->817返回运算。将_转换为_张量_v2(x)
818返回x
819 inputs=nest.map\u结构(\u convert\u non\u张量,inputs)
/convert_to_tensor_v2中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py(值、数据类型、数据类型提示、名称)
1281 name=名称,
1282首选类型=类型提示,
->1283 as_ref=False)
1284
1285
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in convert_to_tensor(值、数据类型、名称、as_ref、首选数据类型、数据类型提示、ctx、接受的结果类型)
1339
1340如果ret为无:
->1341 ret=conversion\u func(值,dtype=dtype,name=name,as\u ref=as\u ref)
1342
1343如果未实施ret:
/函数中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/tensor\u conversion\u registry.py(***解析参数失败***)
50 def默认转换函数(值、数据类型、名称,作为参考):
51 del as_ref#未使用。
--->52返回常量\运算常量(值,数据类型,名称=名称)
53
54
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in constant(值、数据类型、形状、名称)
261返回\u常量\u impl(值、数据类型、形状、名称、验证\u形状=False,
-->262允许_广播=真)
263
264
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant\u op.py in\u constant\u impl(值、数据类型、形状、名称、验证形状、允许广播)
268 ctx=context.context()
269如果ctx.executing_急切地()
-->270 t=转换为张量(值、ctx、数据类型)
271如果形状为“无”:
272返回t
/convert\u to\u eager\u tensor中的usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant\u op.py(值、ctx、dtype)
94 dtype=dtypes.as\u dtype(dtype).as\u datatype\u enum
95 ctx.确保_已初始化()
--->96返回运算符(值,ctx.device\u名称,数据类型)
97
98
ValueError:无法将NumPy数组转换为张量(不支持的对象类型float)。
我认为标记器需要一个带字符串的普通python列表。否则,您可以尝试为numpy数组强制转换数据类型:train\U text=train\U text.astype('U')