Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 在以id为键的dict中以列表或元组形式读取多个标签,即{id:(cat1,cat2,…)}_Python 3.x_Dictionary_Machine Learning_Text Analysis_Multilabel Classification - Fatal编程技术网

Python 3.x 在以id为键的dict中以列表或元组形式读取多个标签,即{id:(cat1,cat2,…)}

Python 3.x 在以id为键的dict中以列表或元组形式读取多个标签,即{id:(cat1,cat2,…)},python-3.x,dictionary,machine-learning,text-analysis,multilabel-classification,Python 3.x,Dictionary,Machine Learning,Text Analysis,Multilabel Classification,我正在建模一个多标签文本分类算法。下面是我的labels.txt文件的一个片段,我想将这些记录转换成一个字典,由元组或列表中相应类别的id组成,即{id:(cat1,cat2)}。这些记录不是新行分隔的。我一直在研究如何将这种数据转换成字典 B0027DQHA0 Movies & TV, TV Music, Classical 0756400120 Books, Literature & Fiction, Anthologies & Literary Coll

我正在建模一个多标签文本分类算法。下面是我的labels.txt文件的一个片段,我想将这些记录转换成一个字典,由元组或列表中相应类别的id组成,即{id:(cat1,cat2)}。这些记录不是新行分隔的。我一直在研究如何将这种数据转换成字典

B0027DQHA0
  Movies & TV, TV
  Music, Classical
0756400120
  Books, Literature & Fiction, Anthologies & Literary Collections, General
  Books, Literature & Fiction, United States
  Books, Science Fiction & Fantasy, Science Fiction, Anthologies
  Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
  Music, Blues
  Music, Pop
  Music, R&B

如果类别名称始终以空格缩进,而ID不缩进,则可以使用此选项区分它们,并将类别名称附加到dict中的列表中,该dict由循环中的ID索引:

r = '''B0027DQHA0
  Movies & TV, TV
  Music, Classical
0756400120
  Books, Literature & Fiction, Anthologies & Literary Collections, General
  Books, Literature & Fiction, United States
  Books, Science Fiction & Fantasy, Science Fiction, Anthologies
  Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
  Music, Blues
  Music, Pop
  Music, R&B'''
d = {}
for l in r.splitlines():
    if l.startswith(' '):
        d.setdefault(i, []).append(l.lstrip())
    else:
        i = l
print(d)
这将产生:

{'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}

这给了我一个错误。请查看更新后的原始问题!在我上面的例子中,
r
是一个字符串,所以我使用了
r.splitlines()
来获得一个行列表,但是如果你的
r
是一个文件对象,你可以用
对r中的l:
进行迭代。是的,我投了票,但不会公开,因为我的声誉低于15,投票和接受答案是不同的。您可以通过单击答案旁边的灰色复选标记来接受答案。1的名声也可以做到这一点。:-)哦,好的,完成了!!(Y) 嘿,“解决方案”。很好:)这么说,选择答案,而不是复制到你的问题。