文章/答案/技术大牛

发布

社区首页 >问答首页 >社会网络演化的优化模型

问社会网络演化的优化模型
EN

Stack Overflow用户

提问于 2011-07-18 02:16:03

回答 4查看 518关注 0票数 3

我正在写一段代码，用来模拟社交网络的演变。其思想是，每个人都被分配到一个节点，并且根据关系是友好的还是不友好的，人们之间的关系(网络上的边)被赋予+1或-1的权重。

使用这个简单的模型，你可以说一个三人的三合会是“平衡的”还是“不平衡的”，这取决于三合一的边缘的乘积是正还是负。

所以最后我要做的是实现一个ising类型的模型。即，如果新网络具有比翻转前的网络更多的平衡三角形(较低的能量)，则翻转随机边，并且保持新关系，如果不是这样，则仅以一定的概率保持新关系。

好了，最后谈到我的问题:我已经写了下面的代码，但是我的数据集包含大约120k的三元组，因此它将需要4天的时间来运行！

有没有人能给我一些关于如何优化代码的建议？

谢谢。

#Importing required librarys

try:
    import matplotlib.pyplot as plt
except:
    raise

import networkx as nx
import csv
import random
import math

def prod(iterable):
    p= 1
    for n in iterable:
        p *= n
    return p


def Sum(iterable):
    p= 0
    for n in iterable:
        p += n[3]
    return p


def CalcTriads(n):  
    firstgen=G.neighbors(n)
    Edges=[]
    Triads=[]

    for i in firstgen:
        Edges.append(G.edges(i))

    for i in xrange(len(Edges)):
        for j in range(len(Edges[i])):# For node n go through the list of edges (j) for the neighboring nodes (i) 
            if set([Edges[i][j][1]]).issubset(firstgen):# If the second node on the edge is also a neighbor of n (its in firstgen) then keep the edge.
                t=[n,Edges[i][j][0],Edges[i][j][1]]
                t.sort()
                Triads.append(t)# Add found nodes to Triads.

    new_Triads = []# Delete duplicate triads.
    for elem in Triads:
        if elem not in new_Triads:
            new_Triads.append(elem)
    Triads = new_Triads 

    for i in xrange(len(Triads)):# Go through list of all Triads finding the weights of their edges using G[node1][node2]. Multiply the three weights and append value to each triad.
            a=G[Triads[i][0]][Triads[i][1]].values()
            b=G[Triads[i][1]][Triads[i][2]].values()
            c=G[Triads[i][2]][Triads[i][0]].values()
            Q=prod(a+b+c)
            Triads[i].append(Q)

    return Triads



###### Import sorted edge data ######       
li=[]
with open('Sorted Data.csv', 'rU') as f:
    reader = csv.reader(f)
    for row in reader:
        li.append([float(row[0]),float(row[1]),float(row[2])])
G=nx.Graph()
G.add_weighted_edges_from(li)


for i in xrange(800000):
    e = random.choice(li)   # Choose random edge

    TriNei=[]
    a=CalcTriads(e[0]) # Find triads of first node in the chosen edge 
    for i in xrange(0,len(a)):
        if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
            TriNei.append(a[i])         
    preH=-Sum(TriNei) # Save the "energy" of all the triads of which the edge is a member


    e[2]=-1*e[2]# Flip the weight of the random edge and create a new graph with the flipped edge   
    G.clear()
    G.add_weighted_edges_from(li)


    TriNei=[]
    a=CalcTriads(e[0])  
    for i in xrange(0,len(a)):
        if set([e[1]]).issubset(a[i]):
            TriNei.append(a[i])
    postH=-Sum(TriNei)# Calculate the post flip "energy".   

    if postH<preH:# If the post flip energy is lower then the pre flip energy keep the change
        continue

    elif random.random() < 0.92: # If the post flip energy is higher then only keep the change with some small probability. (0.92 is an approximate placeholder for exp(-DeltaH)/exp(1) at the moment)
        e[2]=-1*e[2]

python

optimization

networkx

回答 4

Stack Overflow用户

发布于 2011-07-18 02:35:12

下面的建议不会对你的表现有多大的提升，因为它们不是在算法层面上，也就是说，不是非常针对你的问题。但是，它们是对性能略微改进的一般性建议：

除非您正在使用Python 3，否则请更改

for i in range(800000):

至

for i in xrange(800000):

后者只是迭代从0到800000的数字，第一个创建一个巨大的数字列表，然后迭代该列表。使用range对其他循环执行类似的操作。

另外，更改

j=random.choice(range(len(li))) 
e=li[j] # Choose random edge

至

e = random.choice(li)

并随后使用e而不是li[j]。如果您确实需要索引号，请使用random.randint(0, len(li)-1)。

票数 5

Stack Overflow用户

发布于 2011-07-18 03:49:18

您可以进行一些语法更改来加快速度，例如使用内置的等价物sum(x[3] for x in iterable)和reduce(operator.mul, iterable)替换Sum和Prod函数-使用内置函数或生成器表达式通常比显式循环更快。

据我所知，这句话：

    if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)

正在测试一个浮点是否在浮点列表中。将其替换为if e[1] in a[i]:将消除为每次比较创建两个set对象的开销。

顺便说一下，如果您只打算使用索引来访问元素，则不需要遍历数组的索引值。例如，替换

for i in range(0,len(a)):
    if set([e[1]]).issubset(a[i]): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
    TriNei.append(a[i])

使用

for x in a:
    if set([e[1]]).issubset(x): # Keep triads which contain the whole edge (i.e. both nodes on the edge)
    TriNei.append(x)

然而，我怀疑这样的更改不会对整个运行时产生很大的影响。要做到这一点，您需要使用不同的算法或切换到更快的语言。您可以尝试在pypy中运行它--在某些情况下，它可能比CPython快得多。您还可以尝试cython，它会将您的代码编译为C，并且有时可以提供很大的性能增益，特别是当您使用cython类型信息注释您的代码时。我认为最大的改进可能来自于将算法更改为工作更少的算法，但我对此没有任何建议。

顺便说一句，为什么要循环800000次？这个数字的意义是什么？

此外，请为您的变量使用有意义的名称。使用单字符名称或shrtAbbrv根本不会加快代码的速度，而且很难理解它所做的事情。

票数 4

Stack Overflow用户

发布于 2011-07-18 02:24:20

这里有很多你可以改进的地方。首先，使用cProfile之类的工具分析程序。这将告诉您程序的大部分时间都花在了哪里，因此优化可能最有帮助。作为提示，您不需要在程序的每次迭代中生成所有的三元组。

您还需要修复您的缩进，然后才能期待一个像样的答案。

无论如何，这个问题可能更适合Code Review。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6725771

复制

相似问题

问社会网络演化的优化模型
EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问社会网络演化的优化模型EN

回答 4

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问社会网络演化的优化模型
EN