Python 输入无效。应该是字符串、字符串列表/元组或整数列表/元组”;面对面标记化
我在标题中得到了这个问题,类似于。在这种情况下,我知道要说什么错误,但我想知道数据中的哪一行导致了问题?这里的任何人都能告诉我如何解决这个问题 要读取的数据 回溯:Python 输入无效。应该是字符串、字符串列表/元组或整数列表/元组”;面对面标记化,python,tensorflow,huggingface-tokenizers,Python,Tensorflow,Huggingface Tokenizers,我在标题中得到了这个问题,类似于。在这种情况下,我知道要说什么错误,但我想知道数据中的哪一行导致了问题?这里的任何人都能告诉我如何解决这个问题 要读取的数据 回溯: RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/usr/l
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 174, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 340, in wrapper
out = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1823, in _map_single
offset=offset,
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1715, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "<ipython-input-16-aa03da28a7e7>", line 2, in tokenize_function
return tokenizer(data["text"])
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2271, in __call__
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2456, in batch_encode_plus
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 545, in _batch_encode_plus
first_ids = get_input_ids(ids)
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 526, in get_input_ids
"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-42-61ad0d3cfb1a> in <module>()
----> 1 tokenized_train= train.map(tokenize_function,batched=True, num_proc=2)
2 tokenized_test= test.map(tokenize_function,batched=True, num_proc=2)
12 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py in get_input_ids()
524 else:
525 raise ValueError(
--> 526 "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
527 )
528
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
RemoteTraceback回溯(最近一次调用)
远程回溯:
"""
回溯(最近一次呼叫最后一次):
worker中的文件“/usr/local/lib/python3.7/dist packages/multiprocess/pool.py”,第121行
结果=(True,func(*args,**kwds))
包装器中的文件“/usr/local/lib/python3.7/dist-packages/datasets/arrow_-dataset.py”,第174行
输出:Union[“Dataset”,“DatasetDict”]=func(self,*args,**kwargs)
文件“/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py”,第340行,在包装器中
out=func(self,*args,**kwargs)
文件“/usr/local/lib/python3.7/dist-packages/datasets/arrow\u-dataset.py”,第1823行,在地图中
偏移量=偏移量,
文件“/usr/local/lib/python3.7/dist packages/datasets/arrow\u dataset.py”,第1715行,在过滤输入上应用函数
函数(*fn_args,有效_索引,**fn_-kwargs),如果使用其他函数(*fn_args,**fn_-kwargs)
标记化函数中第2行的文件“”
返回标记器(数据[“文本”])
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils\u base.py”,第2271行,在调用中__
**夸尔斯,
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils\u base.py”,第2456行,在batch\u encode\u plus中
**夸尔斯,
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils.py”,第545行,在“批处理编码”中
第一个\u ID=获取\u输入\u ID(ID)
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils.py”,第526行,在get\u input\u id中
“输入无效。应该是字符串、字符串列表/元组或整数列表/元组。“
ValueError:输入无效。应为字符串、字符串列表/元组或整数列表/元组。
"""
上述异常是以下异常的直接原因:
ValueError回溯(最近一次调用上次)
在()
---->1标记化\u train=train.map(标记化\u函数,批处理=True,num\u proc=2)
2 tokenized_test=test.map(tokenize_函数,batched=True,num_proc=2)
12帧
/get\u input\u ids()中的usr/local/lib/python3.7/dist-packages/transformers/tokenization\u utils.py
524其他:
525升值错误(
-->526“输入无效。应为字符串、字符串列表/元组或整数列表/元组。”
527 )
528
ValueError:输入无效。应该是字符串、字符串列表/元组或整数列表/元组。
RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/multiprocess/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 174, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 340, in wrapper
out = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1823, in _map_single
offset=offset,
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1715, in apply_function_on_filtered_inputs
function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
File "<ipython-input-16-aa03da28a7e7>", line 2, in tokenize_function
return tokenizer(data["text"])
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2271, in __call__
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2456, in batch_encode_plus
**kwargs,
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 545, in _batch_encode_plus
first_ids = get_input_ids(ids)
File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 526, in get_input_ids
"Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-42-61ad0d3cfb1a> in <module>()
----> 1 tokenized_train= train.map(tokenize_function,batched=True, num_proc=2)
2 tokenized_test= test.map(tokenize_function,batched=True, num_proc=2)
12 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py in get_input_ids()
524 else:
525 raise ValueError(
--> 526 "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
527 )
528
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.