我一直在想,是否有某种数据结构或聪明的方法来使用字典(O(1)查找)来返回一个值,如果定义的范围没有重叠的话。到目前为止,我一直在想,如果范围有一些恒定的差异(0-2,2-4,4-6等),就可以做到这一点。或者二分搜索可以在O(log(n))时间内完成。
举个例子,假设有一本字典,
d = {[0.0 - 0.1): "a",
[0.1 - 0.3): "b",
[0.3 - 0.55): "c",
[0.55 - 0.7): "d",
[0.7 - 1.0): "e"}它应该会返回,
d[0.05]
>>> "a"
d[0.8]
>>> "e"
d[0.9]
>>> "e"
d[random.random()] # this should also work有什么办法可以实现这样的事情吗?感谢您对此的任何回复或回答。
发布于 2018-11-04 18:39:41
如果您接受范围边界的低(Er)分辨率并牺牲内存以换取查找速度,则可以使用O(1)查找时间。
字典可以在O(1)的平均时间内进行查找,因为在固定大小的数据结构(平均情况下为hash(key) % tablesize)中,键和位置之间存在简单的算术关系。您的范围实际上是具有浮点边界的可变大小,因此没有固定的表大小来映射搜索值。
除非您限制了范围的绝对下限和上限,并让范围边界落在固定的步长上。您的示例使用从0.0到1.0的值,范围可以量化到0.05步长。可以转换成一个固定的表格:
import math
from collections import MutableMapping
# empty slot marker
_EMPTY = object()
class RangeMap(MutableMapping):
"""Map points to values, and find values for points in O(1) constant time
The map requires a fixed minimum lower and maximum upper bound for
the ranges. Range boundaries are quantized to a fixed step size. Gaps
are permitted, when setting overlapping ranges last range set wins.
"""
def __init__(self, map=None, lower=0.0, upper=1.0, step=0.05):
self._mag = 10 ** -round(math.log10(step) - 1) # shift to integers
self._lower, self._upper = round(lower * self._mag), round(upper * self._mag)
self._step = round(step * self._mag)
self._steps = (self._upper - self._lower) // self._step
self._table = [_EMPTY] * self._steps
self._len = 0
if map is not None:
self.update(map)
def __len__(self):
return self._len
def _map_range(self, r):
low, high = r
start = round(low * self._mag) // self._step
stop = round(high * self._mag) // self._step
if not self._lower <= start < stop <= self._upper:
raise IndexError('Range outside of map boundaries')
return range(start - self._lower, stop - self._lower)
def __setitem__(self, r, value):
for i in self._map_range(r):
self._len += int(self._table[i] is _EMPTY)
self._table[i] = value
def __delitem__(self, r):
for i in self._map_range(r):
self._len -= int(self._table[i] is not _EMPTY)
self._table[i] = _EMPTY
def _point_to_index(self, point):
point = round(point * self._mag)
if not self._lower <= point <= self._upper:
raise IndexError('Point outside of map boundaries')
return (point - self._lower) // self._step
def __getitem__(self, point_or_range):
if isinstance(point_or_range, tuple):
low, high = point_or_range
r = self._map_range(point_or_range)
# all points in the range must point to the same value
value = self._table[r[0]]
if value is _EMPTY or any(self._table[i] != value for i in r):
raise IndexError('Not a range for a single value')
else:
value = self._table[self._point_to_index(point_or_range)]
if value is _EMPTY:
raise IndexError('Point not in map')
return value
def __iter__(self):
low = None
value = _EMPTY
for i, v in enumerate(self._table):
pos = (self._lower + (i * self._step)) / self._mag
if v is _EMPTY:
if low is not None:
yield (low, pos)
low = None
elif v != value:
if low is not None:
yield (low, pos)
low = pos
value = v
if low is not None:
yield (low, self._upper / self._mag)上面实现了完整的映射接口,并在索引或测试容纳性时同时接受点和范围(作为模拟[start, stop)间隔的元组)(支持范围使重用默认的键、值和项字典视图实现变得更容易,所有这些都从__iter__实现开始)。
演示:
>>> d = RangeMap({
... (0.0, 0.1): "a",
... (0.1, 0.3): "b",
... (0.3, 0.55): "c",
... (0.55, 0.7): "d",
... (0.7, 1.0): "e",
... })
>>> print(*d.items(), sep='\n')
((0.0, 0.1), 'a')
((0.1, 0.3), 'b')
((0.3, 0.55), 'c')
((0.55, 0.7), 'd')
((0.7, 1.0), 'e')
>>> d[0.05]
'a'
>>> d[0.8]
'e'
>>> d[0.9]
'e'
>>> import random
>>> d[random.random()]
'c'
>>> d[random.random()]
'a'如果不能很容易地限制步长和边界,那么下一个最好的选择是使用某种binary search algorithm;将范围按排序顺序排列,并在数据结构的中间选择一个点;根据搜索键高于或低于中点,继续在数据结构的任何一半中搜索,直到找到匹配项。
如果您的范围涵盖了从最低边界到最高边界的整个间隔,则可以使用bisect module;只需将每个范围的下限或上限存储在一个列表中,将相应的值存储在另一个列表中,然后使用二分法将第一个列表中的位置映射到第二个列表中的结果。
如果您的范围有间隙,则需要保留具有另一个边界的第三个列表,并首先验证点是否在该范围内,或者使用interval tree。对于非重叠范围,简单的二叉树就可以了,但也有更多支持重叠范围的专用实现。PyPI上有一个支持全间隔树操作的intervaltree project。
将行为与固定表实现相匹配的bisect-based映射如下所示:
from bisect import bisect_left
from collections.abc import MutableMapping
class RangeBisection(MutableMapping):
"""Map ranges to values
Lookups are done in O(logN) time. There are no limits set on the upper or
lower bounds of the ranges, but ranges must not overlap.
"""
def __init__(self, map=None):
self._upper = []
self._lower = []
self._values = []
if map is not None:
self.update(map)
def __len__(self):
return len(self._values)
def __getitem__(self, point_or_range):
if isinstance(point_or_range, tuple):
low, high = point_or_range
i = bisect_left(self._upper, high)
point = low
else:
point = point_or_range
i = bisect_left(self._upper, point)
if i >= len(self._values) or self._lower[i] > point:
raise IndexError(point_or_range)
return self._values[i]
def __setitem__(self, r, value):
lower, upper = r
i = bisect_left(self._upper, upper)
if i < len(self._values) and self._lower[i] < upper:
raise IndexError('No overlaps permitted')
self._upper.insert(i, upper)
self._lower.insert(i, lower)
self._values.insert(i, value)
def __delitem__(self, r):
lower, upper = r
i = bisect_left(self._upper, upper)
if self._upper[i] != upper or self._lower[i] != lower:
raise IndexError('Range not in map')
del self._upper[i]
del self._lower[i]
del self._values[i]
def __iter__(self):
yield from zip(self._lower, self._upper)发布于 2018-11-04 14:57:07
首先,将数据拆分为两个数组:
limits = [0.1, 0.3, 0.55, 0.7, 1.0]
values = ["a", "b", "c", "d", "e"]limits是经过排序的,因此您可以在其中执行binary search:
import bisect
def value_at(n):
index = bisect.bisect_left(limits, n)
return values[index]https://stackoverflow.com/questions/53138434
复制相似问题