问随机分词器
EN

Code Review用户

提问于 2015-09-21 17:29:36

回答 1查看 251关注 0票数 6

我写了一个分词函数。它把一个单词分裂成随机字符。例如，如果输入是“运行时”(以下每一个输出中的一个)，则可能：

['runtime']

['r','untime']

['r','u','n','t','i','m','e'] ....

但是它的运行时是非常高的，当我想分裂100,000字，你有什么建议来优化或写它更聪明。

def random_multisplitter(word):
    from numpy import mod
    spw = []
    length = len(word)
    rand = random_int(word)
    if rand == length:       #probability of not splitting
        return [word]

    else:
        div = mod(rand, (length + 1))#defining division points 
        bound = length - div
        spw.append(div)
        while div != 0:
            rand = random_int(word)
            div = mod(rand,(bound+1))
            bound = bound-div
            spw.append(div)
        result = spw
    b = 0
    points =[]
    for x in range(len(result)-1): #calculating splitting points 
        b=b+result[x]
        points.append(b)
    xy=0
    t=[]
    for i in points:
        t.append(word[xy:i])
        xy=i
    if word[xy:len(word)]!='':
        t.append(word[xy:len(word)])
    if type(t)!=list:
        return [t]
    return t

python

performance

回答 1

Code Review用户

发布于 2015-09-22 16:27:31

关于代码的质量，你是对的，我正试图通过征求建议来提高我的技能。有人还建议在下面的代码，我认为它更好的运行时。

def random_multisplitter(word):
    # add's bits will tell whether a char shall be added to last substring or
    # be the beginning of its own substring
    add = random.randint(0, 2**len(word) - 1)

    # append 0 to make sure first char is start of first substring
    add <<= 1

    res = []
    for char in word:
        # see if last bit is 1
        if add & 1:
            res[-1] += char
        else:
            res.append(char)
        # shift to next bit
        add >>= 1

    return res

票数 0

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/105286

复制

相似问题

问随机分词器
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问随机分词器EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问随机分词器
EN