文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Python中的pstats和cProfile。如何使数组工作得更快？

问使用Python中的pstats和cProfile。如何使数组工作得更快？
EN

Stack Overflow用户

提问于 2015-07-11 19:25:16

回答 1查看 2.2K关注 0票数 3

这是我对代码的第一次优化，我对此感到兴奋。读一些文章，但我仍然有一些问题。

首先，在我下面的代码中，花了这么多时间做什么？我认为这里是数组:array.append(len(set(line.split()。我在网上看到python中的列表工作得更快，但我在这里看不到使用列表。有谁知道如何改进这一点吗？

( 2)我还有其他改进吗？

3)此外，在线上还说，for循环会大大降低代码的速度。这里能改进吗？(我想用C编写代码最好，但是:D )

( 4)为什么人们建议总是看“电话”和“全天候”？对我来说“打电话”更有意义。它告诉你你的函数或调用有多快。

5)在这里，在正确的答案B类中，他应用了列表。他有吗？对于我来说，我仍然看到一个数组和一个For循环，这应该会减慢速度。增长numpy数字数组的最快方法

谢谢。

新的cProfile结果：

 618384 function calls in 9.966 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    19686    3.927    0.000    4.897    0.000 <ipython-input-120-d8351bb3dd17>:14(f)
    78744    3.797    0.000    3.797    0.000 {numpy.core.multiarray.array}
    19686    0.948    0.000    0.948    0.000 {range}
    19686    0.252    0.000    0.252    0.000 {method 'partition' of 'numpy.ndarray' objects}
    19686    0.134    0.000    0.930    0.000 function_base.py:2896(_median)
        1    0.126    0.126    9.965    9.965 <ipython-input-120-d8351bb3dd17>:22(<module>)
    19686    0.125    0.000    0.351    0.000 _methods.py:53(_mean)
    19686    0.120    0.000    0.120    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    19686    0.094    0.000    4.793    0.000 function_base.py:2747(_ureduce)
    19686    0.071    0.000    0.071    0.000 {method 'flatten' of 'numpy.ndarray' objects}
    19686    0.065    0.000    0.065    0.000 {method 'format' of 'str' objects}
    78744    0.055    0.000    3.852    0.000 numeric.py:464(asanyarray)

新代码：

import numpy
import cProfile

pr = cProfile.Profile()
pr.enable()

#paths to files
read_path = '../tweet_input/tweets.txt'
write_path = "../tweet_output/ft2.txt"


def f(a):  
    for i in range(0, len(array)):
        if a <= array[i]:
            array.insert(i, a)
            break
    if 0 == len(array):
        array.append(a)

try:
    with open(read_path) as inf, open(write_path, 'a') as outf:
        array = []
        #for every line (tweet) in the file
        for line in inf:                                            ###Loop is bad. Builtin function is good
            #append amount of unique words to the array
            wordCount = len(set(line.split()))
            #print wordCount, array
            f(wordCount)
            #write current median of the array to the file
            result = "{:.2f}\n".format(numpy.median(array))
            outf.write(result)
except IOError as e:
    print 'Operation failed: %s' % e.strerror


###Service
pr.disable()
pr.print_stats(sort = 'time')

旧的cProfile结果: 551211次函数调用在13.195秒内按:内部时间顺序进行

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)         78744   10.193    0.000   10.193    0.000 {numpy.core.multiarray.array}

旧代码：

    with open(read_path) as inf, open(write_path, 'a') as outf:
        array = []
        #for every line in the file
        for line in inf:                            
            #append amount of unique words to the array
            array.append(len(set(line.split())))
            #write current median of the array to the file
            result = "{:.2f}\n".format(numpy.median(array))
            outf.write(result)

python

optimization

profiling

cprofile

pstats

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-07-11 19:47:37

Numpy使用的meadian查找算法是O(n log )。您每一行调用一次numpy.meadian，所以您的算法最后是O(n^2 log )。

在这方面有几种改进的方法。一种是将数组保持排序(即在维护排序顺序的位置插入每个元素)。每个insert采用O(n) (插入到数组是线性时间操作)，得到排序数组的中值为O(1)，因此最后是O(n^2)。

对于分析，您想要查看的主要内容是tottime，因为它告诉您程序在函数中总共花费了多少时间。在您的示例中，percall有时不是很有用，因为有时，如果您有一个慢函数(高percall)，但它只被调用了几次(低numcalls)，那么tottime与其他函数相比最终是微不足道的。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31360932

复制

相似问题

问使用Python中的pstats和cProfile。如何使数组工作得更快？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python中的pstats和cProfile。如何使数组工作得更快？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python中的pstats和cProfile。如何使数组工作得更快？
EN