文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python中创建严格增加列表的最快方法

问在Python中创建严格增加列表的最快方法
EN

Stack Overflow用户

提问于 2017-04-23 23:47:39

回答 5查看 3.5K关注 0票数 16

我想找出在Python中实现以下功能的最有效方法：

假设我们有两个列表a和b，它们的长度相等，最多包含1e7元素。然而，为了便于说明，我们可以考虑以下几点：

a = [2, 1, 2, 3, 4, 5, 4, 6, 5, 7, 8, 9, 8,10,11]
b = [1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15]

目标是从a_new创建一个严格的单调列表a，而只使用具有相同值的样本点的第一个样本点。在a中必须删除的相同索引也应在b中删除，以便最终结果是：

a_new = [2, 3, 4, 5, 6, 7, 8, 9,10,11]
b_new = [1, 4, 5, 6, 8,10,11,12,14,15]

当然，这可以使用计算昂贵的for循环来完成，但是由于数据量巨大，这是不合适的。

任何建议都是非常感谢的。

python

python-2.7

pandas

numpy

scipy

回答 5

Stack Overflow用户

回答已采纳

发布于 2017-04-24 01:51:27

使用numba运行@juanpa.arrivelaga函数的一个版本

import numba

def psi(A):
    a_cummax = np.maximum.accumulate(A)
    a_new, idx = np.unique(a_cummax, return_index=True)
    return idx

def foo(arr):
    aux=np.maximum.accumulate(arr)
    flag = np.concatenate(([True], aux[1:] != aux[:-1]))
    return np.nonzero(flag)[0]

@numba.jit
def f(A):
    m = A[0]
    a_new, idx = [m], [0]
    for i, a in enumerate(A[1:], 1):
        if a > m:
            m = a
            a_new.append(a)
            idx.append(i)
    return idx

定时

%timeit f(a)
The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.83 µs per loop

%timeit foo(a)
The slowest run took 9.41 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.35 µs per loop

%timeit psi(a)
The slowest run took 9.66 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.95 µs per loop

票数 11

Stack Overflow用户

发布于 2017-04-23 23:53:24

您可以计算a的累积最大值，然后使用np.unique删除重复值，使用它还可以记录唯一索引，从而相应地子集b：

a = np.array([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11])
b = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])

a_cummax = np.maximum.accumulate(a)    
a_new, idx = np.unique(a_cummax, return_index=True)

a_new
# array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

b[idx]
# array([ 1,  4,  5,  6,  8, 10, 11, 12, 14, 15])

票数 14

Stack Overflow用户

发布于 2017-04-24 00:27:03

下面是一个普通的Python解决方案，它可以执行一次操作：

>>> a = [2,1,2,3,4,5,4,6,5,7,8,9,8,10,11]
>>> b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
>>> a_new, b_new = [], []
>>> last = float('-inf')
>>> for x, y in zip(a, b):
...     if x > last:
...         last = x
...         a_new.append(x)
...         b_new.append(y)
...
>>> a_new
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> b_new
[1, 4, 5, 6, 8, 10, 11, 12, 14, 15]

我很好奇，它与numpy解决方案相比有什么不同，后者具有类似的时间复杂度，但会对数据进行几次传递。

这是一些时间安排。首先，设置：

>>> small = ([2,1,2,3,4,5,4,6,5,7,8,9,8,10,11], [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
>>> medium = (np.random.randint(1, 10000, (10000,)), np.random.randint(1, 10000, (10000,)))
>>> large = (np.random.randint(1, 10000000, (10000000,)), np.random.randint(1, 10000000, (10000000,)))

现在有两种方法：

>>> def monotonic(a, b):
...     a_new, b_new = [], []
...     last = float('-inf')
...     for x,y in zip(a,b):
...         if x > last:
...             last = x
...             a_new.append(x)
...             b_new.append(y)
...     return a_new, b_new
...
>>> def np_monotonic(a, b):
...     a_new, idx = np.unique(np.maximum.accumulate(a), return_index=True)
...     return a_new, b[idx]
...

注意，这些方法并不是严格等效的，一个停留在普通的Python中，另一个停留在numpy数组中。假设您从相应的数据结构( numpy.array或list)开始，我们将比较性能：

首先，一个小列表，和OP的例子一样，我们看到numpy实际上并不是更快，这对于小数据结构来说并不奇怪：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, small; a, b = small", number=10000)
0.039130652003223076
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, small, np; a, b = np.array(small[0]), np.array(small[1])", number=10000)
0.10779813499539159

现在是一个包含10,000个元素的“中等”列表/数组，我们开始看到numpy的优势：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, medium; a, b = medium[0].tolist(), medium[1].tolist()", number=10000)
4.642718859016895
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, medium; a, b = medium", number=10000)
1.3776302759943064

有趣的是，在1e7元素的顺序上，“大”数组的优势似乎缩小了：

>>> timeit.timeit("monotonic(a,b)", "from __main__ import monotonic, large; a, b = large[0].tolist(), large[1].tolist()", number=10)
4.400254560023313
>>> timeit.timeit("np_monotonic(a,b)", "from __main__ import np_monotonic, large; a, b = large", number=10)
3.593393853981979

请注意，在最后一对计时中，我每个只做了10次，但是如果有人有更好的机器或更多的耐心，请随意增加number。

票数 10

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43577744

复制

相似问题

问在Python中创建严格增加列表的最快方法
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中创建严格增加列表的最快方法EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中创建严格增加列表的最快方法
EN