有一个对象列表--一个对象包含一个名称和一个供应商--我需要将对象列表与供应商分开。相同的供应商对象应该在同一个列表中。
我怎样才能在蟒蛇中有效地做到这一点:
我有一个清单:
[
{
name:"Ram",
vendor:1,
},
{
name:"Shaam",
vendor:2
},
{
name:"Mohan",
vendor:1
},
{
name:"Sohan",
vendor:3
},
{
name:"Aman",
vendor:2
}
]我想要的名单:-
[
{
name:"Ram",
vendor:1,
},
{
name:"Mohan",
vendor:1
}
][
{
name:"Shaam",
vendor:2
},
{
name:"Aman",
vendor:2
}
][
{
name:"Sohan",
vendor:3
}
]发布于 2022-02-01 13:03:54
我不确定这段代码的效率,但我相信它会输出您想要的内容。请注意,输出列表将为0索引,因此确保从供应商编号中减去1以获得索引。
代码
data = [
{
'name':"Ram",
'vendor':1,
},
{
'name':"Shaam",
'vendor':2
},
{
'name':"Mohan",
'vendor':1
},
{
'name':"Sohan",
'vendor':3
},
{
'name':"Aman",
'vendor':2
}
]
storage = {}
for datum in data:
name = datum['name']
vendor = datum['vendor']
if vendor in storage:
storage[vendor].append(name)
else:
storage[vendor] = [name]
output = []
for key in sorted(storage.keys()):
output.append([])
for value in storage[key]:
new_dict = {'name': value, 'vendor': key}
output[key - 1].append(new_dict)
for val in output:
print(val)输出
[{'name': 'Ram', 'vendor': 1}, {'name': 'Mohan', 'vendor': 1}]
[{'name': 'Shaam', 'vendor': 2}, {'name': 'Aman', 'vendor': 2}]
[{'name': 'Sohan', 'vendor': 3}]
Process finished with exit code 0发布于 2022-02-01 12:57:39
您可以这样做(想想如何在每次迭代中存储不同的列表):
for i in range(numOfVendors):
yourList = [x for x in initialList if x["vendor"] == i]发布于 2022-02-01 14:23:29
TL;DR:顺序分类器很好,因为python map()也是顺序的。
此外,由于multiprocessing.Pool.map引入了进程间通信开销,所以它可能会造成过度消耗.如果您确信data非常长,请使用此方法。
这是我的个人资料代码:
sequential:是一种基于映射的单机顺序classifiermulti:过程.from funcy import print_durations
from collections import defaultdict
from multiprocessing import Pool
DATA = [
{
'name':"Ram",
'vendor':1,
},
{
'name':"Shaam",
'vendor':2
},
{
'name':"Mohan",
'vendor':1
},
{
'name':"Sohan",
'vendor':3
},
{
'name':"Aman",
'vendor':2
}
]
def sequential(data):
ans = defaultdict(lambda: [])
for d in data:
ans[d["vendor"]].append(d)
return dict(ans)
def multi(data, worker=10):
def k_fold(myList, N):
# Ref: https://stackoverflow.com/questions/2130016/splitting-a-list-into-n-parts-of-approximately-equal-length
return [myList[(i*len(myList))//N:((i+1)*len(myList))//N] for i in range(N)]
def merge_all(dicts):
ans = defaultdict(lambda: [])
for d in dicts:
for k, v in d.items():
ans[k].extend(v)
return ans
data = k_fold(data, worker)
with Pool(worker) as P:
ans = P.map(sequential, data)
return dict(merge_all(ans))
def timing(data_size=1000):
print("timing for data size {:.0E}".format(data_size))
data = DATA * data_size
@print_durations()
def timing_seq():
sequential(data)
@print_durations()
def timing_multi():
multi(data)
s = timing_seq()
m = timing_multi()
assert s == m
for i in range(3, 8):
timing(10**i)在我的膝上型电脑中输出,带有python 3.10.2:
$ python test.py
timing for data size 1E+03
475.65 mks in timing_seq()
36.63 ms in timing_multi()
timing for data size 1E+04
4.52 ms in timing_seq()
22.28 ms in timing_multi()
timing for data size 1E+05
46.35 ms in timing_seq()
84.26 ms in timing_multi()
timing for data size 1E+06
459.60 ms in timing_seq()
510.74 ms in timing_multi()
timing for data size 1E+07
4.34 s in timing_seq()
3.00 s in timing_multi()遗憾的是,只有当数据大小达到10^7级别时,multi才会表现得更好。
但是如果你给更多这样的工人,它就会变大:
# ...
# Same script
data_size = 10**7
print("timing for data size {:.0E}".format(data_size))
for i in range(1, 6):
worker = 2 ** i
print("worker = {}".format(worker))
data = DATA * data_size
@print_durations()
def timing_multi():
multi(data, worker)
timing_multi()输出显示性能等级,但过多的工人会引入间接费用,从而抵消速度加快的影响;)
timing for data size 1E+07
worker = 2
3.89 s in timing_multi()
worker = 4
2.91 s in timing_multi()
worker = 8
2.47 s in timing_multi()
worker = 16
2.52 s in timing_multi()
worker = 32
3.43 s in timing_multi()参考文献:Is there a simple process-based parallel map for python?
https://stackoverflow.com/questions/70940912
复制相似问题