我写了一个分词函数。它把一个单词分裂成随机字符。例如,如果输入是“运行时”(以下每一个输出中的一个),则可能:
['runtime']
['r','untime']
['r','u','n','t','i','m','e'] ....但是它的运行时是非常高的,当我想分裂100,000字,你有什么建议来优化或写它更聪明。
def random_multisplitter(word):
from numpy import mod
spw = []
length = len(word)
rand = random_int(word)
if rand == length: #probability of not splitting
return [word]
else:
div = mod(rand, (length + 1))#defining division points
bound = length - div
spw.append(div)
while div != 0:
rand = random_int(word)
div = mod(rand,(bound+1))
bound = bound-div
spw.append(div)
result = spw
b = 0
points =[]
for x in range(len(result)-1): #calculating splitting points
b=b+result[x]
points.append(b)
xy=0
t=[]
for i in points:
t.append(word[xy:i])
xy=i
if word[xy:len(word)]!='':
t.append(word[xy:len(word)])
if type(t)!=list:
return [t]
return t发布于 2015-09-22 16:27:31
关于代码的质量,你是对的,我正试图通过征求建议来提高我的技能。有人还建议在下面的代码,我认为它更好的运行时。
def random_multisplitter(word):
# add's bits will tell whether a char shall be added to last substring or
# be the beginning of its own substring
add = random.randint(0, 2**len(word) - 1)
# append 0 to make sure first char is start of first substring
add <<= 1
res = []
for char in word:
# see if last bit is 1
if add & 1:
res[-1] += char
else:
res.append(char)
# shift to next bit
add >>= 1
return reshttps://codereview.stackexchange.com/questions/105286
复制相似问题