Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 输入无效。应该是字符串、字符串列表/元组或整数列表/元组”;面对面标记化_Python_Tensorflow_Huggingface Tokenizers - Fatal编程技术网

Python 输入无效。应该是字符串、字符串列表/元组或整数列表/元组”;面对面标记化

Python 输入无效。应该是字符串、字符串列表/元组或整数列表/元组”;面对面标记化,python,tensorflow,huggingface-tokenizers,Python,Tensorflow,Huggingface Tokenizers,我在标题中得到了这个问题,类似于。在这种情况下,我知道要说什么错误,但我想知道数据中的哪一行导致了问题?这里的任何人都能告诉我如何解决这个问题 要读取的数据 回溯: RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/usr/l

我在标题中得到了这个问题,类似于。在这种情况下,我知道要说什么错误,但我想知道数据中的哪一行导致了问题?这里的任何人都能告诉我如何解决这个问题

要读取的数据

回溯:

RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/multiprocess/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 174, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 340, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1823, in _map_single
    offset=offset,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1715, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "<ipython-input-16-aa03da28a7e7>", line 2, in tokenize_function
    return tokenizer(data["text"])
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2271, in __call__
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2456, in batch_encode_plus
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 545, in _batch_encode_plus
    first_ids = get_input_ids(ids)
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 526, in get_input_ids
    "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-42-61ad0d3cfb1a> in <module>()
----> 1 tokenized_train= train.map(tokenize_function,batched=True, num_proc=2)
      2 tokenized_test= test.map(tokenize_function,batched=True, num_proc=2)

12 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py in get_input_ids()
    524             else:
    525                 raise ValueError(
--> 526                     "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
    527                 )
    528 

ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
RemoteTraceback回溯(最近一次调用)
远程回溯:
"""
回溯(最近一次呼叫最后一次):
worker中的文件“/usr/local/lib/python3.7/dist packages/multiprocess/pool.py”,第121行
结果=(True,func(*args,**kwds))
包装器中的文件“/usr/local/lib/python3.7/dist-packages/datasets/arrow_-dataset.py”,第174行
输出:Union[“Dataset”,“DatasetDict”]=func(self,*args,**kwargs)
文件“/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py”,第340行,在包装器中
out=func(self,*args,**kwargs)
文件“/usr/local/lib/python3.7/dist-packages/datasets/arrow\u-dataset.py”,第1823行,在地图中
偏移量=偏移量,
文件“/usr/local/lib/python3.7/dist packages/datasets/arrow\u dataset.py”,第1715行,在过滤输入上应用函数
函数(*fn_args,有效_索引,**fn_-kwargs),如果使用其他函数(*fn_args,**fn_-kwargs)
标记化函数中第2行的文件“”
返回标记器(数据[“文本”])
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils\u base.py”,第2271行,在调用中__
**夸尔斯,
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils\u base.py”,第2456行,在batch\u encode\u plus中
**夸尔斯,
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils.py”,第545行,在“批处理编码”中
第一个\u ID=获取\u输入\u ID(ID)
文件“/usr/local/lib/python3.7/dist packages/transformers/tokenization\u utils.py”,第526行,在get\u input\u id中
“输入无效。应该是字符串、字符串列表/元组或整数列表/元组。“
ValueError:输入无效。应为字符串、字符串列表/元组或整数列表/元组。
"""
上述异常是以下异常的直接原因:
ValueError回溯(最近一次调用上次)
在()
---->1标记化\u train=train.map(标记化\u函数,批处理=True,num\u proc=2)
2 tokenized_test=test.map(tokenize_函数,batched=True,num_proc=2)
12帧
/get\u input\u ids()中的usr/local/lib/python3.7/dist-packages/transformers/tokenization\u utils.py
524其他:
525升值错误(
-->526“输入无效。应为字符串、字符串列表/元组或整数列表/元组。”
527                 )
528
ValueError:输入无效。应该是字符串、字符串列表/元组或整数列表/元组。
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/multiprocess/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 174, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 340, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1823, in _map_single
    offset=offset,
  File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1715, in apply_function_on_filtered_inputs
    function(*fn_args, effective_indices, **fn_kwargs) if with_indices else function(*fn_args, **fn_kwargs)
  File "<ipython-input-16-aa03da28a7e7>", line 2, in tokenize_function
    return tokenizer(data["text"])
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2271, in __call__
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_base.py", line 2456, in batch_encode_plus
    **kwargs,
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 545, in _batch_encode_plus
    first_ids = get_input_ids(ids)
  File "/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py", line 526, in get_input_ids
    "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-42-61ad0d3cfb1a> in <module>()
----> 1 tokenized_train= train.map(tokenize_function,batched=True, num_proc=2)
      2 tokenized_test= test.map(tokenize_function,batched=True, num_proc=2)

12 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils.py in get_input_ids()
    524             else:
    525                 raise ValueError(
--> 526                     "Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers."
    527                 )
    528 

ValueError: Input is not valid. Should be a string, a list/tuple of strings or a list/tuple of integers.