首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >块存储重写

块存储重写
EN

Code Review用户
提问于 2019-03-13 15:33:41
回答 1查看 868关注 0票数 6

摘要

在块存储系统中,新数据以块形式写入。我们将把闪存表示为一个顺序数组。我们有一个以大小为2:writes[i] = [first_block_written, last_block_written]的数组形式出现的块写入列表。每个块都有重写限制。如果块上的重写达到了特定的阈值,则应该运行特殊的诊断。给定blockCount (表示块总数的整数)、writes (大小为2的块写入数组的列表)和threshold,您的任务是返回不相交的块段列表,每个块都由已达到重写阈值的块组成。块段的列表应按其左端的递增顺序排序。

示例

对于blockCount = 10, writes = [[0, 4], [3, 5], [2, 6]]threshold = 2,输出应该是blockStorageRewrites(blockCount, writes, threshold) = [[2, 5]]。第一次写入后,块0, 1, 2, 34一次写入;第二次写入后,块0、1、2和5一次写入,块34达到重写阈值;最后写完之后,块25也达到重写阈值,因此应该诊断的块是2, 3, 45。块2, 3, 45形成一个后续段[2, 5]。对于blockCount = 10writes = [[0, 4], [3, 5], [2, 6]]threshold = 3,输出应为blockCount = 10writes = [[3, 4], [0, 1], [6, 6]]threshold = 1blockStorageRewrites(blockCount, writes, threshold) = [[3, 4]],输出应为blockStorageRewrites(blockCount, writes, threshold) = [[0, 1], [3, 4], [6, 6]]

约束

  • 1 ≤ blockCount ≤ 10**5
  • 0 ≤ writes.length ≤ 10**5
  • writes[i].length = 2
  • 0 ≤ writes[i][0] ≤ writes[i][1] < blockCount

第一次尝试

代码语言:javascript
复制
from itertools import groupby

def group_blocks(num_writes, threshold):
    i = 0
    for key, group in groupby(num_writes, lambda x : x >= threshold):
        # This is faster compared to len(list(g))
        length = sum(1 for _ in group)
        if key:
            yield [i, i + length - 1]
        i += length

def blockStorageRewrites(blockCount, writes, threshold):
    num_writes = [0] * blockCount
    for lower, upper in writes:
        for i in range(lower, upper + 1):
            num_writes[i] += 1
    return list(group_blocks(num_writes, threshold))

在这里,我只做了被问到的事情,我创建了一个blockCount大小的数组,循环在写过程中,最后用itertoos.groupby对连续的范围进行分组

尝试优化

后的

我不太清楚复杂性是什么,但我试图减轻负载,但我仍然没有通过TLE约束

代码语言:javascript
复制
def get_bad_blocks(blockCount, writes, threshold):
    cons = False
    u = l = -1
    for i in range(blockCount):
        count = 0
        for lower, upper in writes:
            if lower <= i <= upper:
                count += 1
            if count >= threshold:
                if not cons:
                    cons = True
                    l = i
                    u = -1
                break
        else:
            u = i - 1
            cons = False

        if u != -1 and l != -1:
            yield [l, u]
            l = -1
    if cons:
        yield [l, i]


def blockStorageRewrites(blockCount, writes, threshold):    
    return list(get_bad_blocks(blockCount, writes, threshold))

问题

你可以复习所有的东西,但最好我能找到这样的答案:

  • 告诉我我的第一个例子是否可读
  • 在我的第二次尝试中,我不太关心可读性,而是对速度更感兴趣
  • 请忽略PEP8命名冲突,因为这是编程挑战站点的一个问题。
EN

回答 1

Code Review用户

回答已采纳

发布于 2019-03-13 16:17:54

首先,当您优化代码时,您需要知道要优化什么。起初,我认为问题代码不是groupby,而是num_writes的创建。因此,我更改了您的代码,以便能够找到它的性能。

代码语言:javascript
复制
import cProfile
import random

from itertools import groupby

def group_blocks(num_writes, threshold):
    i = 0
    for key, group in groupby(num_writes, lambda x : x >= threshold):
        # This is faster compared to len(list(g))
        length = sum(1 for _ in group)
        if key:
            yield [i, i + length - 1]
        i += length


def build_writes(block_count, writes):
    num_writes = [0] * block_count
    for lower, upper in writes:
        for i in range(lower, upper + 1):
            num_writes[i] += 1
    return num_writes


def blockStorageRewrites(blockCount, writes, threshold):
    num_writes = build_writes(blockCount, writes)
    return list(group_blocks(num_writes, threshold))


block_count = 10**5
writes = []
for _ in range(10**4):
    a = random.randrange(0, block_count)
    b = random.randrange(a, block_count)
    writes.append([a, b])


cProfile.run('blockStorageRewrites(block_count, writes, 10**4)')

其结果如下:

代码语言:javascript
复制
         200008 function calls in 25.377 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002   25.377   25.377 <string>:1(<module>)
   100001    0.019    0.000    0.025    0.000 lus.py:10(<genexpr>)
        1   25.342   25.342   25.342   25.342 lus.py:16(build_writes)
        1    0.000    0.000   25.375   25.375 lus.py:24(blockStorageRewrites)
        1    0.000    0.000    0.033    0.033 lus.py:6(group_blocks)
   100000    0.007    0.000    0.007    0.000 lus.py:8(<lambda>)
        1    0.000    0.000   25.377   25.377 {built-in method builtins.exec}
        1    0.007    0.007    0.033    0.033 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

按照乔治的评论将代码更改为:

代码语言:javascript
复制
def build_writes(block_count, writes):
    num_writes = dict(enumerate([0] * block_count))
    for lower, upper in writes:
        num_writes[lower] += 1
        num_writes[upper] -= 1
    return list(accumulate(num_writes))

获取以下配置文件,它的速度要快几个数量级:

代码语言:javascript
复制
         200011 function calls in 0.066 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    0.066    0.066 <string>:1(<module>)
   100002    0.021    0.000    0.028    0.000 lus.py:10(<genexpr>)
        1    0.025    0.025    0.025    0.025 lus.py:16(build_writes)
        1    0.003    0.003    0.064    0.064 lus.py:24(blockStorageRewrites)
        2    0.000    0.000    0.036    0.018 lus.py:6(group_blocks)
   100000    0.008    0.000    0.008    0.000 lus.py:8(<lambda>)
        1    0.000    0.000    0.066    0.066 {built-in method builtins.exec}
        2    0.008    0.004    0.036    0.018 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
票数 7
EN
页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://codereview.stackexchange.com/questions/215355

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档