在我的代码中,一个常见的模式是:“搜索一个列表直到我找到一个特定的元素,然后查看它之前和之后的元素。”
例如,我可能需要查看一个日志文件,其中的重要事件被标记为星号,然后提取重要事件的上下文。
在下面的示例中,我想知道超级驱动器为什么会爆炸:
Spinning up the hyperdrive
Hyperdrive speed 100 rpm
Hyperdrive speed 200 rpm
Hyperdrive lubricant levels low (100 gal.)
* CRITICAL EXISTENCE FAILURE
Hyperdrive exploded我想要一个函数,get_item_with_context(),它允许我找到带有星号的第一行,然后给出它前面的n行,以及后面的m行。
我的尝试如下:
import collections, itertools
def get_item_with_context(predicate, iterable, items_before = 0, items_after = 0):
# Searches through the list of `items` until an item matching `predicate` is found.
# Then return that item.
# If no item matching predicate is found, return None.
# Optionally, also return up to `items_before` items preceding the target, and
# `items after` items after the target.
#
# Note:
d = collections.deque (maxlen = items_before + 1 + items_after)
iter1 = iterable.__iter__()
iter2 = itertools.takewhile(lambda x: not(predicate(x)), iter1)
d.extend(iter2)
# zero-length input, or no matching item
if len(d) == 0 or not(predicate(d[-1])):
return None
# get context after match:
try:
for i in xrange(items_after):
d.append(iter1.next())
except StopIteration:
pass
if ( items_before == 0 and items_after == 0):
return d[0]
else:
return list(d)用法应类似于:
>>> get_item_with_context(lambda x: x == 3, [1,2,3,4,5,6],
items_before = 1, items_after = 1)
[2, 3, 4]这方面的问题:
not(predicate(d[-1])),不工作,因为某种原因。它总是返回假的。items_after项,则结果是垃圾。请给我一些建议,如何使这个工作/使它更健壮?或者,如果我在重新发明方向盘,你也可以告诉我。
发布于 2012-05-03 13:11:23
这似乎正确地处理了边缘情况:
from collections import deque
def item_with_context(predicate, seq, before=0, after=0):
q = deque(maxlen=before)
it = iter(seq)
for s in it:
if predicate(s):
return list(q) + [s] + [x for _,x in zip(range(after), it)]
q.append(s)发布于 2012-05-03 09:53:39
您可以使用collections.deque对象为上下文获取环形缓冲区。要获得+/- 2行上下文,请如下所示初始化它:
context = collections.deque(maxlen=5)然后遍历您喜欢的任何内容,对每一行都调用它:
context.append(line)在context[2]上进行匹配,并输出每个匹配的全部deque内容。
发布于 2012-05-03 09:39:17
这可能是一种完全“非丙酮”的解决方案:
import itertools
def get_item_with_context(predicate, iterable, items_before = 0, items_after = 0):
found_index = -1
found_element = None
before = [None] * items_before # Circular buffer
after = []
after_index = 0
for element, index in zip(iterable, itertools.count()):
if found_index >= 0:
after += [element]
if len(after) >= items_after:
break
elif predicate(element):
found_index = index
found_element = element
if not items_after:
break
else:
if items_before > 0:
before[after_index] = element
after_index = (after_index + 1) % items_before
if found_index >= 0:
if after_index:
# rotate the circular before-buffer into place
before = before[after_index:] + before[0:after_index]
if found_index - items_before < 0:
# slice off elements that "fell off" the start
before = before[items_before - found_index:]
return before, found_element, after
return None
for index in range(0, 8):
x = get_item_with_context(lambda x: x == index, [1,2,3,4,5,6], items_before = 1, items_after = 2)
print(index, x)输出:
0 None
1 ([], 1, [2, 3])
2 ([1], 2, [3, 4])
3 ([2], 3, [4, 5])
4 ([3], 4, [5, 6])
5 ([4], 5, [6])
6 ([5], 6, [])
7 None我冒昧地更改了输出,以使其更清楚地说明与谓词匹配的内容以及前后出现的内容:
([2], 3, [4, 5])
^ ^ ^
| | +-- after the element
| +------- the element that matched the predicate
+----------- before the element该函数处理:
None (如果要返回其他函数,则返回最后一行函数)。N元素)它使用:
https://stackoverflow.com/questions/10428421
复制相似问题