文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在python中加速非线性优化的多次迭代？

问如何在python中加速非线性优化的多次迭代？
EN

Stack Overflow用户

提问于 2020-12-27 04:36:10

回答 2查看 363关注 0票数 0

我试图用PyGMO包来解决非线性优化问题，优化类是单独定义的，然后通过一个单独的函数dyn_optimGMO调用。这个优化必须完成并保存到，例如，由变量(节点) inits ( or init_val)定义的1000个随机初始向量。

使用timeit模块，我发现每次迭代都需要大约17 seconds来完成。这意味着它将需要大约5 hours的1000 iterations。这是一个非常大的时间。

如果我必须重复这一点，比如说，20 perturb节点，那么总迭代将进入200000，它将像上面计算的那样需要线性时间。

我尝试通过使用python multiprocessing模块对20个扰动节点中的每个1000次迭代进行并行化来解决这个问题。但这没什么用。

我也尝试使用Numba函数，但是它们不识别pyGMO模块，因此失败了。

有没有任何方法并行化这段代码，并使其在任意次数的迭代中更快？

请让我知道我的问题是否足够清楚，如果没有，我会根据需要补充细节。

import numpy as np
import pygmo as pg

matL = np.random.rand(300,300) ; node_len = 300

inits = []; results = []


perturb = {25:0} #setting a random node, say, node 25 to 0

class my_constrained_udp:
    
    def __init__(self):
        pass
    
    def fitness(self, x):
        matA = np.matrix(x)
        obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
        ce1 = sum(init_val) - sum(x)                   
        return [obj1, ce1]
   
    def n_objs(self): # no of objectives
        return 1


    def get_nec(self): #no of equality constraints
        return 1   

 
    def get_nic(self): #no of in-equality constraints
        return 0                    


    def get_bounds(self): #lower and upper bounds: use this to perturb the node
        lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
        if perturb:
            for k,v in perturb.items():
                lowerB[k] = v; upperB[k] = v
        return (lowerB,upperB)

  
    def gradient(self, x):
        return pg.estimate_gradient_h(lambda x: self.fitness(x), x)


def dyn_optimGMO(matL, node_len ,init):
        
    if perturb:
        for k,v in perturb.items(): init_val[k] = v  #setting perturbations in initial values
    
    inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
    
    algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
    #algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
    pop = pg.population(prob = my_constrained_udp(), size = 1000 , seed=123)
    pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
    pop = algo.evolve(pop) 
   
    res = pop.champion_x   
    return res

# running above optimization code for 1000 random initializations

for i in range(1000):
    init_val = np.array([random.uniform(0, 1) for k in range(node_len)])
    
    if perturb:
        for k,v in perturb.items(): init_val[k] = v  #setting perturbations in initial values
    
    res = dyn_optimGMO(matL ,node_len ,init_val) # this function is defined here only
    
    inits.append(init_val); results.append(res)

编辑1：

正如下面@Ananda所建议的那样，我对目标函数进行了修改，使运行时间减少了近7倍。我重写了代码，以便使用pythonmultiprocessing模块在1000 iterations上运行这段代码。下面是我的新代码，在这里我试图生成进程来并行地处理迭代。因为我的系统只有8个线程，所以我将池的大小限制为5，因为PyGMO也使用内部并行化，它也需要一些线程

import numpy as np
import pygmo as pg


matL = np.random.rand(300,300) ; node_len = 300

perturb = {12:1} # assign your perturb ID here

def optimizationFN(var):

    results = []
    
    inits = var[0]; perturb = var[1]

    
    class my_constrained_udp:
        
        def fitness(self, x):
            obj1 = x[None,:] @ matL @ x[:,None] # @ is mat multiplication operator
            ce1 = np.sum(inits) - np.sum(x)                   
            return [obj1, ce1]
       
        def n_objs(self): # no of objectives
            return 1
        
        def get_nec(self): #no of equality constraints
            return 1    
        
        def get_nic(self): #no of in-equality constraints
            return 0                    
        
        def get_bounds(self): #lower and upper bounds: use this to perturb the node
            lowerB = np.array([0]*node_len); upperB = np.array([1]*node_len)
            if perturb:
                for k,v in perturb.items():
                    lowerB[k] = v; upperB[k] = v
            return (lowerB,upperB)
        
        def gradient(self, x):
            return pg.estimate_gradient_h(lambda x: self.fitness(x), x)
    
    def dyn_optimGMO(matL, node_len ,inits):
        '''
        perturb should be a dict of node index and value as 0 or 1. Type(node_index) = int
        '''  
        if perturb:
            for k,v in perturb.items(): inits[k] = v  #setting perturbations in initial values
            
        inner_algo = pg.nlopt("slsqp"); inner_algo.maxeval = 5
        
        algo = pg.algorithm(uda = pg.mbh(inner_algo, stop = 2, perturb = .2))
        
        #algo.set_verbosity(10) # in this case this correspond to logs each 1 call to slsqp
        
        pop = pg.population(prob = my_constrained_udp(), size = 100 , seed=123)
        
        pop.problem.c_tol = [1E-6] * 1 # get_nec + get_nic = 1, so multiplied by 1
        pop = algo.evolve(pop) 
       
        res = pop.champion_x   
        return res
    
    
    if perturb:
        for k,v in perturb.items(): inits[k] = v  #setting perturbations in initial values
    
    res = dyn_optimGMO(matL ,node_len ,inits) # this function is defined here only
    
    results.append(res)
    
    return results

import time

st = time.time()
    
#1000 random initialisations
initial_vals = []
for i in range(1000): initial_vals.append(np.array([random.uniform(0, 1) for k in range(node_len)]))
initial_vals = np.array(initial_vals)

inp_val = []
for i in range(len(initial_vals)): inp_val.append([initial_vals[i],perturb])

#running evaluations
#eqVal = optimizationFN(initial_vals,perturb=perturb)
from multiprocessing import Pool


myPool = Pool(8)

data = myPool.map(optimizationFN,inp_val)

myPool.close(); myPool.join()


print('Total Time: ',round(time.time()-st,4))

这将在1.13 hours中执行整个1000个迭代。

但是，还有其他的可能让我能让它更快吗？

parallel-processing

python-multiprocessing

python

numpy

optimization

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-12-27 05:40:52

在尝试并行化等之前，试着找出性能的瓶颈到底是什么，并尝试修复它。

如果您使用线路轮廓仪分析您的健身功能，

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    28                                               @profile
    29                                               def fitness(self, x):
    30     96105    3577978.0     37.2      9.5          matA = np.matrix(x)
    31     96105    8548353.0     88.9     22.7          obj1 = matA.dot(matL).dot(matA.T)[0,0] #this is int value
    32     96105   25328835.0    263.6     67.4          ce1 = sum(init_val) - sum(x)
    33     96105     121800.0      1.3      0.3          return [obj1, ce1]

如您所见，大部分时间都花在dot和sum函数上，而创建matA的时间也很长。

我会把这个函数重写成这样-

def fitness(self, x):

    obj1 = x[None, :] @ matL @ x[:, None]
    ce1 = np.sum(init_val) - np.sum(x)

    return [obj1, ce1]

如果你能看到这个函数，

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                               @profile
    21                                               def fitness(self, x):
    22                                           
    23     77084    3151649.0     40.9     48.9          obj1 = x[None, :] @ matL @ x[:, None]
    24     77084    3214012.0     41.7     49.9          ce1 = np.sum(init_val) - np.sum(x)
    25                                           
    26     77084      79439.0      1.0      1.2          return [obj1, ce1]

整个功能每次命中的总时间从380次下降到80次。

建议不再使用np.matrix方法，并将不再建议使用。而使用原生python sum而不是np.sum可以大大降低性能。

在我的机器上，它使性能从33秒/ it提高到6秒/迭代。大约5倍的性能增益。

票数 2

Stack Overflow用户

发布于 2020-12-27 05:48:37

Q：“有任何方法来并行化这段代码并使其在任意次数的迭代中更快吗？”

是。如果尝试“从外部”numba.jit()代码(由于Numba编译警告的原因而失败)，您可以使用分发有关上述1k+独立初始化的批处理的部分，并允许并行计算这些部分，然后收集结果。

这样做的好处是可以在短时间内提高1000倍的性能，并且可以进一步扩展。

如果您使用的是一个由约1k+节点组成的大学集群，那么您的1k批计算可能在大约同一时间内产生结果，一个单独运行将完成1k长序列的第一个任务(这里的通信成本可以忽略不计，参见Amdahl论点 )。

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65462895

复制

相似问题

问如何在python中加速非线性优化的多次迭代？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中加速非线性优化的多次迭代？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在python中加速非线性优化的多次迭代？
EN