以下是代码:
import numpy as np
from numpy.random import random
@profile
def point_func(point, points, funct):
return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
@profile
def point_afunc(ipoints, epoints, funct):
res = np.zeros(len(ipoints))
for idx, point in enumerate(ipoints):
res[idx] = point_func(point, epoints, funct)
return res
@profile
def main():
points = random((5000,3))
rpoint = random((1,3))
pres = point_func(rpoint, points, lambda r : r**3)
ares = point_afunc(points, points, lambda r : r**3)
if __name__=="__main__":
main()我用kernprof分析了一下,得到了这个:
Timer unit: 1e-06 s
Total time: 2.25667 s File: point-array-vectorization.py Function: point_func at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def point_func(point, points, funct):
6 5001 2256667.0 451.2 100.0 return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
Total time: 2.27844 s File: point-array-vectorization.py Function: point_afunc at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 @profile
9 def point_afunc(ipoints, epoints, funct):
10 1 5.0 5.0 0.0 res = np.zeros(len(ipoints))
11 5001 4650.0 0.9 0.2 for idx, point in enumerate(ipoints):
12 5000 2273789.0 454.8 99.8 res[idx] = point_func(point, epoints, funct)
13 1 0.0 0.0 0.0 return res
Total time: 2.28239 s File: point-array-vectorization.py Function: main at line 15
Line # Hits Time Per Hit % Time Line Contents
==============================================================
15 @profile
16 def main():
17 1 145.0 145.0 0.0 points = random((5000,3))
18 1 2.0 2.0 0.0 rpoint = random((1,3))
19
20 1 507.0 507.0 0.0 pres = point_func(rpoint, points, lambda r : r**3)
21
22 1 2281731.0 2281731.0 100.0 ares = point_afunc(points, points, lambda r : r**3)所以这部分大部分时间都是:
11 5001 4650.0 0.9 0.2 for idx, point in enumerate(ipoints):
12 5000 2273789.0 454.8 99.8 res[idx] = point_func(point, epoints, funct)我想知道时间损失是否是由调用funct循环中的for造成的。要做到这一点,我想使用point_afunc对numpy.vectorize进行矢量化。我已经尝试过了,但是它似乎将点矢量化:循环结束在各个点组件上循环。
@profile
def point_afunc(ipoints, epoints, funct):
res = np.zeros(len(ipoints))
for idx, point in enumerate(ipoints):
res[idx] = point_func(point, epoints, funct)
return res
point_afunc = np.vectorize(point_afunc)导致错误:
File "point-array-vectorization.py", line 24, in main
ares = point_afunc(points, points, lambda r : r**3)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2755, in __call__
return self._vectorize_call(func=func, args=vargs)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2825, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "/usr/lib/python3.6/site-packages/numpy/lib/function_base.py", line 2785, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "/usr/lib/python3.6/site-packages/line_profiler.py", line 115, in wrapper
result = func(*args, **kwds)
File "point-array-vectorization.py", line 10, in point_afunc
res = np.zeros(len(ipoints))
TypeError: object of type 'numpy.float64' has no len()不知怎的,在ipoints中安装了每个点的矢量化,它在各个点的组件上向量化?
编辑:尝试了下面@John的建议,并使用了numba。与没有@jit相比,我使用它的执行时间更长。如果从所有函数中删除@profile装饰符,并将其替换为用于point_func和point_afunc的@jit,则执行时间如下:
time ./point_array_vectorization.py
real 0m3.686s
user 0m3.584s
sys 0m0.077s
point-array-vectorization> time ./point_array_vectorization.py
real 0m3.683s
user 0m3.596s
sys 0m0.063s
point-array-vectorization> time ./point_array_vectorization.py
real 0m3.751s
user 0m3.658s
sys 0m0.070s所有的@jit装饰师都被删除了:
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.925s
user 0m2.874s
sys 0m0.030s
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.950s
user 0m2.902s
sys 0m0.029s
point-array-vectorization> time ./point_array_vectorization.py
real 0m2.951s
user 0m2.886s
sys 0m0.042s我是否需要更多地帮助numba编译器?
编辑:point_afunc可以在没有numpy循环的情况下使用numpy编写吗?
编辑:将循环版本与Peter的numpy广播版本进行比较,循环版本更快:
Timer unit: 1e-06 s
Total time: 2.13361 s
File: point_array_vectorization.py
Function: point_func at line 7
Line # Hits Time Per Hit % Time Line Contents
==============================================================
7 @profile
8 def point_func(point, points, funct):
9 5001 2133615.0 426.6 100.0 return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
Total time: 2.1528 s
File: point_array_vectorization.py
Function: point_afunc at line 11
Line # Hits Time Per Hit % Time Line Contents
==============================================================
11 @profile
12 def point_afunc(ipoints, epoints, funct):
13 1 5.0 5.0 0.0 res = np.zeros(len(ipoints))
14 5001 4176.0 0.8 0.2 for idx, point in enumerate(ipoints):
15 5000 2148617.0 429.7 99.8 res[idx] = point_func(point, epoints, funct)
16 1 0.0 0.0 0.0 return res
Total time: 2.75093 s
File: point_array_vectorization.py
Function: new_point_afunc at line 18
Line # Hits Time Per Hit % Time Line Contents
==============================================================
18 @profile
19 def new_point_afunc(ipoints, epoints, funct):
20 1 2750926.0 2750926.0 100.0 return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)
Total time: 4.90756 s
File: point_array_vectorization.py
Function: main at line 22
Line # Hits Time Per Hit % Time Line Contents
==============================================================
22 @profile
23 def main():
24 1 170.0 170.0 0.0 points = random((5000,3))
25 1 4.0 4.0 0.0 rpoint = random((1,3))
26 1 546.0 546.0 0.0 pres = point_func(rpoint, points, lambda r : r**3)
27 1 2155829.0 2155829.0 43.9 ares = point_afunc(points, points, lambda r : r**3)
28 1 2750945.0 2750945.0 56.1 vares = new_point_afunc(points, points, lambda r : r**3)
29 1 71.0 71.0 0.0 assert(np.max(np.abs(ares-vares)) < 1e-15)发布于 2018-07-19 12:14:42
numpy.vectorize()在性能方面没有什么有用的东西:构建隐藏的for循环的只是语法糖(或者更确切地说,是语法氰化物)。这帮不了你。
有一件事可能对您有很大的帮助,那就是南巴。它可以及时编译您的原始代码,并且可能会大大加快它的速度。只需将@profile装饰器替换为@numba.jit即可。
发布于 2018-07-19 14:52:01
你可以用广播来做这个。广播是对点矩阵的重塑,使维度“广播”彼此对立。例如,ipoints[:, None, :] - epoints[None, :, :]说:
ipoints从MxD到Mx1xDepoints从NxD重塑为1xNxD完整的代码如下:
def new_point_afunc(ipoints, epoints, funct):
return np.sum(funct(np.sqrt((ipoints[:, None, :] - epoints[None, :, :])**2).sum(axis=-1)), axis=1)警告--在您的示例中,点的维数只有3,但是对于较高的维度,这可能不是实用的内存方面的,因为这种ipoints[:, None, :] - epoints[None, :, :]方法创建了一个具有形状(len(ipoints), len(epoints), n_dim)的中间矩阵。
发布于 2018-07-19 15:20:28
实例使用Numba
这种方法的性能取决于某人想要调用创建的函数的频率和输入数据的大小。由于编译开销约为1.67秒,因此不适合对相对较小的数据使用这种方法,或者只调用该函数一次。
我还使用了您的代码进行了一些小的修改。使用Numba编写普通循环可以使多个向量化命令(如np.sum(funct(np.sqrt(((point - points)**2)).sum(1))) )在运行时和编译时都更快。此外,这个问题可以很容易地并行化,但这将进一步增加编译。
示例
import numpy as np
from numpy.random import random
import numba as nb
import time
def make_point_func(funct):
@nb.njit(fastmath=True)
def point_func(point, points):
return np.sum(funct(np.sqrt(((point - points)**2)).sum(1)))
return point_func
def make_point_afunc(funct,point_func):
@nb.njit(fastmath=True)
def point_afunc(ipoints, epoints):
res = np.zeros(len(ipoints))
for idx in range(len(ipoints)):
res[idx] = point_func(ipoints[idx], epoints)
return res
return point_afunc
def main():
points = random((5000,3))
rpoint = random((1,3))
#Make functions
point_func=make_point_func(nb.njit(lambda r : r**3))
point_afunc=make_point_afunc(nb.njit(lambda r : r**3),point_func)
#first call
print("First call with compilation overhead")
t1=time.time()
pres = point_func(rpoint, points)
ares = point_afunc(points, points)
print(time.time()-t1)
print("Second call without compilation overhead")
t1=time.time()
#second call
pres = point_func(rpoint, points)
ares = point_afunc(points, points)
print(time.time()-t1)
if __name__=="__main__":
main()性能
original: 1.7s
Numba first call: 1.87s
Numba further calls: 0.2shttps://stackoverflow.com/questions/51421992
复制相似问题