此代码:
from itertools import groupby, count
L = [38, 98, 110, 111, 112, 120, 121, 898]
groups = groupby(L, key=lambda item, c=count():item-next(c))
tmp = [list(g) for k, g in groups]取[38, 98, 110, 111, 112, 120, 121, 898],将其按连续数字分组,并将它们与最后的输出合并:
['38', '98', '110,112', '120,121', '898']如何对具有多列的列表进行同样的处理,如下面的列表,您可以根据名称和第二列值的顺序对它们进行分组,然后合并。
换言之,这些数据:
L= [
['Italy','1','3']
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2']]应提供以下输出:
[['Italy','1-2-3','3-1-10'],
['France','5','3'],
['Spain','4-5-6','2-8-4'],
['Spain','20','2']]more-itertools应该更适合这个任务吗?
用Python中的itertools/more-itertools对多列列表的项进行分组和组合。
发布于 2018-02-07 13:45:27
这基本上是相同的分组技术,但它不是使用itertools.count,而是使用enumerate来生成索引。
首先,我们对数据进行排序,以便将给定国家的所有项目组合在一起,并对数据进行排序。然后,我们使用groupby为每个国家组成一个小组。然后,我们在内部循环中使用groupby对每个国家的连续数据进行分组。最后,我们使用zip & .join将数据重新排列成所需的输出格式。
from itertools import groupby
from operator import itemgetter
lst = [
['Italy','1','3'],
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2'],
]
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]
for row in newlst:
print(row)输出
['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']我承认lambda有点神秘;可能最好使用适当的def函数。几分钟后我会在这里加进去。
下面是使用一个更易读的键函数的相同的事情。
def keyfunc(t):
# Unpack the index and data
i, data = t
# Get the 2nd column from the data, as an integer
val = int(data[1])
# The difference between val & i is constant in a consecutive group
return val - i
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), keyfunc)]发布于 2018-02-07 13:39:01
您可以在相同的菜谱上构建,并修改lambda函数,以包含来自每一行的第一个条目(Country)。其次,您需要首先根据列表中国家的最后出现情况对列表进行排序。
from itertools import groupby, count
L = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']]
indices = {row[0]: i for i, row in enumerate(L)}
sorted_l = sorted(L, key=lambda row: indices[row[0]])
groups = groupby(
sorted_l,
lambda item, c=count(): [item[0], int(item[1]) - next(c)]
)
for k, g in groups:
print [k[0]] + ['-'.join(x) for x in zip(*(x[1:] for x in g))]输出:
['Italy', '1-2-3', '3-1-10']
['France', '5', '3']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']发布于 2018-02-07 12:59:55
而不是使用需要多个排序、检查等的itertools.groupby。下面是一种使用字典进行算法优化的方法:
d = {}
flag = False
for country, i, j in L:
temp = 1
try:
item = int(i)
for counter, recs in d[country].items():
temp += 1
last = int(recs[-1][0])
if item in {last - 1, last, last + 1}:
recs.append([i, j])
recs.sort(key=lambda x: int(x[0]))
flag = True
break
if flag:
flag = False
continue
else:
d[country][temp] = [[i, j]]
except KeyError:
d[country] = {}
d[country][1] = [[i, j]]在一个更复杂的示例中演示:
L = [['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2'],
['France', '5', '44'],
['France', '9', '3'],
['Italy', '3', '10'],
['Italy', '5', '17'],
['Italy', '4', '13'],]
{'France': {1: [['5', '3'], ['5', '44']], 2: [['9', '3']]},
'Spain': {1: [['4', '2'], ['5', '8'], ['6', '4']], 2: [['20', '2']]},
'Italy': {1: [['1', '3'], ['2', '1'], ['3', '10'], ['3', '10'], ['4', '13']], 2: [['5', '17']]}}
# You can then produce the results in your intended format as below:
for country, recs in d.items():
for rec in recs.values():
i, j = zip(*rec)
print([country, '-'.join(i), '-'.join(j)])
['France', '5-5', '3-44']
['France', '9', '3']
['Italy', '1-2-3-3-4', '3-1-10-10-13']
['Italy', '5', '17']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']https://stackoverflow.com/questions/48664043
复制相似问题