我正在建模一个多标签文本分类算法。下面是我的labels.txt文件的一个片段,我想把这些记录转换成一个字典,字典中包含有相应类别的元组或列表,即{ id :(cat1,cat2)}。记录不是分开的新行。我在如何将这种数据转换成字典上陷入困境。
B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B发布于 2018-08-02 01:24:02
如果类别名称总是缩进空格,而ID不缩进,则可以使用它来区分它们,并将类别名称附加到循环中由ID索引的dict中的列表中:
r = '''B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B'''
d = {}
for l in r.splitlines():
if l.startswith(' '):
d.setdefault(i, []).append(l.lstrip())
else:
i = l
print(d)这一产出如下:
{'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}https://stackoverflow.com/questions/51644014
复制相似问题