我正在尝试运行一个类似于Google的PageRank算法的函数(当然,出于非商业目的)。下面是Python代码;请注意,a[0]是这里唯一重要的东西,a[0]包含n x n矩阵(如[[0,1,1],[1,0,1],[1,1,0]] )。此外,您还可以找到我从维基百科中获得的代码
def GetNodeRanks(a): # graph, names, size
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
b = [1]*len(adjacencyMatrix)
tmp = [0]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = 0
for k in range(len(adjacencyMatrix)):
tmp[j] = tmp[j] + adjacencyMatrix[j][k] * b[k]
norm_sq = 0
for j in range(len(adjacencyMatrix)):
norm_sq = norm_sq + tmp[j]*tmp[j]
norm = math.sqrt(norm_sq)
for j in range(len(b)):
b[j] = tmp[j] / norm
print b
return b 当我运行这个实现时(在一个比3 x 3矩阵大得多的矩阵上,n.b.),它无法产生足够的精度来计算等级,从而使我能够对它们进行有效的比较。所以我试了一下:
from decimal import *
getcontext().prec = 5
def GetNodeRanks(a): # graph, names, size
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
b = [Decimal(1)]*len(adjacencyMatrix)
tmp = [Decimal(0)]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = Decimal(0)
for k in range(len(adjacencyMatrix)):
tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
norm = Decimal(norm_sq).sqrt
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)
print b
return b 即使在这种低精度的情况下,代码也非常慢,在我坐等它运行的时候一直没有运行完。以前,代码是快速的,但不够精确。
是否有一种合理/简单的方法使代码同时快速、准确地运行?
发布于 2014-04-17 18:07:28
提速的小窍门是:
遍历您的代码:
from decimal import *
getcontext().prec = 5
def GetNodeRanks(a): # graph, names, size
# opt: pass in directly a[0], you do not use the rest
numIterations = 10
adjacencyMatrix = copy.deepcopy(a[0])
#opt: why copy.deepcopy? You do not modify adjacencyMatric
b = [Decimal(1)]*len(adjacencyMatrix)
# opt: You often call Decimal(1) and Decimal(0), it takes some time
# do it only once like
# dec_zero = Decimal(0)
# dec_one = Decimal(1)
# prepare also other, repeatedly used data structures
# len_adjacencyMatrix = len(adjacencyMatrix)
# adjacencyMatrix_range = range(len_ajdacencyMatrix)
# Replace code with pre-calculated variables yourself
tmp = [Decimal(0)]*len(adjacencyMatrix)
for i in range(numIterations):
for j in range(len(adjacencyMatrix)):
tmp[j] = Decimal(0)
for k in range(len(adjacencyMatrix)):
tmp[j] = Decimal(tmp[j] + adjacencyMatrix[j][k] * b[k])
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])
norm = Decimal(norm_sq).sqrt #is this correct? I woudl expect .sqrt()
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)
print b
return b 现在很少有关于如何在Python中优化列表处理的示例。
使用sum,更改:
norm_sq = Decimal(0)
for j in range(len(adjacencyMatrix)):
norm_sq = Decimal(norm_sq + tmp[j]*tmp[j])至:
norm_sq = sum(val*val for val in tmp)对清单的理解:
更改:
for j in range(len(b)):
b[j] = Decimal(tmp[j] / norm)改为:
b = [Decimal(tmp_itm / norm) for tmp_itm in tmp]如果您得到这种编码风格,您也将能够优化初始循环,并可能会发现,一些预先计算的变量正在变得过时。
https://stackoverflow.com/questions/23121631
复制相似问题