Python &引用;没有足够的价值来解包;加载数据集时-Allennlp\u读取
我正在尝试使用Allennlp库来执行NER。该库在conll2003和其他只有实体和令牌的数据库中工作得非常好(我必须更新相同的_read函数)。 但是,如果我尝试使用自己的数据集,该函数将返回“ValueError:没有足够的值来解包(预期为2,得到1)”。我比较了格式、特殊字符、间距甚至文件名,但没有发现任何问题。 这是工作数据集中的样本Python &引用;没有足够的价值来解包;加载数据集时-Allennlp\u读取,python,machine-learning,nlp,allennlp,Python,Machine Learning,Nlp,Allennlp,我正在尝试使用Allennlp库来执行NER。该库在conll2003和其他只有实体和令牌的数据库中工作得非常好(我必须更新相同的_read函数)。 但是,如果我尝试使用自己的数据集,该函数将返回“ValueError:没有足够的值来解包(预期为2,得到1)”。我比较了格式、特殊字符、间距甚至文件名,但没有发现任何问题。 这是工作数据集中的样本 O show O me O films O with B-ACTOR drew I-ACTOR barrymore O from O
O show
O me
O films
O with
B-ACTOR drew
I-ACTOR barrymore
O from
O the
B-YEAR 1980s
O what
O movies
O starred
O both
B-ACTOR al
I-ACTOR pacino
这是我的数据集中的样本,它不起作用
O dated
O as
O of
B-STARTDATE February
I-STARTDATE 9
I-STARTDATE ,
L-STARTDATE 2017
O by
O and
O between
O Allenware
O Ltd
我无法识别问题,请帮助
更新
根据请求添加stderr.log
0it [00:00, ?it/s]
1it [00:00, 556.72it/s]
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/allennlp/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/allennlp/lib/python3.6/site-packages/allennlp/run.py", line 18, in run
main(prog="allennlp")
File "/allennlp/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 102, in main
args.func(args)
File "/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 124, in train_model_from_args
args.cache_prefix)
File "/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 168, in train_model_from_file
cache_directory, cache_prefix)
File "/allennlp/lib/python3.6/site-packages/allennlp/commands/train.py", line 226, in train_model
cache_prefix)
File "/allennlp/lib/python3.6/site-packages/allennlp/training/trainer_pieces.py", line 42, in from_params
all_datasets = training_util.datasets_from_params(params, cache_directory, cache_prefix)
File "/allennlp/lib/python3.6/site-packages/allennlp/training/util.py", line 185, in datasets_from_params
validation_data = validation_and_test_dataset_reader.read(validation_data_path)
File "/allennlp/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 134, in read
instances = [instance for instance in Tqdm.tqdm(instances)]
File "/allennlp/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 134, in <listcomp>
instances = [instance for instance in Tqdm.tqdm(instances)]
File "/allennlp/lib/python3.6/site-packages/tqdm/std.py", line 1081, in __iter__
for obj in iterable:
File "/allennlp/lib/python3.6/site-packages/allennlp/data/dataset_readers/conll2003.py", line 119, in _read
ner_tags,tokens_ = fields
ValueError: not enough values to unpack (expected 2, got 1)
0it [00:00, ?it/s]
请回答问题以提供完整的回溯。如果可以的话,还请指出是哪个输入触发了它。(也许您可以将示例数据精简为一行o#input以生成正确的?)数据是否包含制表符或文本空格序列?我只尝试了一条记录,但给出了相同的错误。我也试过使用tab和space,但都不起作用。如果您更改了
\u read
功能,请告诉我们您是如何更改的。我注意到它提供了额外的日志记录,因此您应该能够通过增加日志级别来获取更多信息。我已经更新了问题,并添加了两个函数。
@overrides
def _read(self, file_path: str) -> Iterable[Instance]:
# if `file_path` is a URL, redirect to the cache
file_path = cached_path(file_path)
with open(file_path, "r") as data_file:
logger.info("Reading instances from lines in file at: %s", file_path)
# Group into alternative divider / sentence chunks.
for is_divider, lines in itertools.groupby(data_file, _is_divider):
# Ignore the divider chunks, so that `lines` corresponds to the words
# of a single sentence.
if not is_divider:
fields = [line.strip().split() for line in lines]
# unzipping trick returns tuples, but our Fields need lists
fields = [list(field) for field in zip(*fields)]
ner_tags,tokens_ = fields
# TextField requires ``Token`` objects
tokens = [Token(token) for token in tokens_]
yield self.text_to_instance(tokens,ner_tags)
def text_to_instance( # type: ignore
self,
tokens: List[Token],
ner_tags: List[str] = None,
) -> Instance:
"""
We take `pre-tokenized` input here, because we don't have a tokenizer in this class.
"""
sequence = TextField(tokens, self._token_indexers)
instance_fields: Dict[str, Field] = {"tokens": sequence}
instance_fields["metadata"] = MetadataField({"words": [x.text for x in tokens]})
coded_ner=ner_tags
if 'ner' in self.feature_labels:
if coded_ner is None:
raise ConfigurationError("Dataset reader was specified to use NER tags as "
" features. Pass them to text_to_instance.")
instance_fields['ner_tags'] = SequenceLabelField(coded_ner, sequence, "ner_tags")
if self.tag_label == 'ner' and coded_ner is not None:
instance_fields['tags'] = SequenceLabelField(coded_ner, sequence,self.label_namespace)
return Instance(instance_fields)