这只是使用传入的项创建一个表。我希望使这一功能更短和/或更有效率。我所拥有的似乎是多余的。
def apriori_creator(items, k):
"""Function that creates apriori table using supported candidates"""
table = []
the_length = len(items)
new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
for x in range(the_length):
next = x + 1
for y in range(next, the_length):
combined = items[x].union(items[y]) # merge
l_sub_one = list(items[x])
l_sub_one.sort()
l_sub_two = list(items[y])
l_sub_two.sort()
first = l_sub_one[:new_k] # getting the first k-2 items of the set
second = l_sub_two[:new_k]
if first == second:
table.append(combined)
return table发布于 2014-12-15 12:42:05
为了提高效率,您可以从内部循环中提取一些内容:
x的计算应该在外部循环中。combined的计算应该在使用它的if下面。enumerate和直接迭代代替range更简单,速度也更快。l_sub_one = sorted(items[x])的操作与这两行相同: l_sub_one = list(itemsX) l_sub_one.sort()经修订的守则:
def apriori_creator(items, k):
"""Function that creates apriori table using supported candidates"""
table = []
new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
for x, item_x in enumerate(items):
l_sub_one = sorted(item_x)
first = l_sub_one[:new_k] # getting the first k-2 items of the set
for item_y in items[x+1:]:
l_sub_two = sorted(item_y)
second = l_sub_two[:new_k]
if first == second:
combined = item_x.union(item_y) # merge
table.append(combined)
return table还可能有进一步的改进。描述您正在做的事情的一种方法是生成共享sorted(item)[:new_k]相同值的所有不同对项。您的代码通过考虑所有可能的配对来做到这一点。一种更有效的方法首先是将项目按该值分组。在字典的帮助下,这只需要对项目进行一次传递。然后,您只需要生成对与每一组。
这是再次修改的代码。我使用collections.defaultdict进行分组,itertools.combinations用于配对。
from collections import defaultdict
from itertools import combinations
def apriori_creator(items, k):
"""Function that creates apriori table using supported candidates"""
new_k = k-2 # first k-2 items as described in algorithm in book for 'apriori gen'
groups = defaultdict(list) # group by the first k-2 items of the set
for item in items:
l_sub_one = sorted(item)
key = tuple(l_sub_one[:new_k])
groups[key].append(item)
table = []
for group in groups.values():
if len(group) > 1:
for item_x, item_y in combinations(group, 2): # all pairs
combined = item_x.union(item_y) # merge
table.append(combined)
return table 发布于 2014-12-15 21:25:01
看来你希望items是sets的名单,我想你可以利用更合适的数据结构,比如SortedSet。
from sortedcontainers import SortedSet
def apriori_creator(items, k):
table = []
items = [SortedSet(item) for item in items]
for x, item in enumerate(items):
for later_item in items[x+1:]:
if item[:k-2] == later_item[:k-2]:
# Since the first k-2 elements are the same, we should only
# need to merge the remaining two elements, at most.
table.append(item.union(later_item[k-2:]))
return tablehttps://codereview.stackexchange.com/questions/73629
复制相似问题