我有一个元组的np数组trainY。每个元组都是一组标签:
array([('php', 'image-processing', 'file-upload', 'upload', 'mime-types'),
('firefox',),
('r', 'matlab', 'machine-learning'),
('c#', 'url', 'encoding'),
('php', 'api', 'file-get-contents'),
('proxy', 'active-directory', 'jmeter'),
('core-plot',),
('c#', 'asp.net', 'windows-phone-7'),
('.net', 'javascript', 'code-generation'),
('sql', 'variables', 'parameters', 'procedure', 'calls')], dtype=object)我想要创建一个作为索引的Dict对象。键将是标签,值将是包含每个键出现在其中的行号的列表:
例如;
键:值
‘'php':{0,4}
我现在要做的代码是:
label_index = {}
for i, labels in enumerate(trainY):
for label in labels:
if label in label_index.keys():
label_index[label].append(i)
else:
label_index[label] = [i]有更快(也许是矢量化)的方法来编写代码吗?
谢谢!
发布于 2013-12-06 18:35:03
使用collections.defaultdict
>>> a = np.array([('php', 'image-processing', 'file-upload', 'upload', 'mime-types'),
('firefox',),
('r', 'matlab', 'machine-learning'),
('c#', 'url', 'encoding'),
('php', 'api', 'file-get-contents'),
('proxy', 'active-directory', 'jmeter'),
('core-plot',),
('c#', 'asp.net', 'windows-phone-7'),
('.net', 'javascript', 'code-generation'),
('sql', 'variables', 'parameters', 'procedure', 'calls')], dtype=object)
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i, x in enumerate(a):
... for k in x:
... d[k].append(i)
...
>>> d['php']
[0, 4]发布于 2013-12-06 18:32:57
在Python2中,dict.keys()返回一个列表,因此除了创建不必要的列表之外,它还将O(1)查找转换为线性扫描
label_index = {}
for i, labels in enumerate(trainY):
for label in labels:
if label in label_index:
label_index[label].append(i)
else:
label_index[label] = [i]https://stackoverflow.com/questions/20431052
复制相似问题