文章/答案/技术大牛

发布

问快速选择C#
EN

Code Review用户

提问于 2019-06-24 19:19:12

回答 2查看 1.1K关注 0票数 2

我试图实现https://en.wikipedia.org/wiki/Quickselect

quickselect是一种在无序列表中查找k个最小元素的选择算法.这是我在快速排序中所遵循的伪代码，有一个名为分区的子过程，它可以在线性时间内将一个列表(从左到右的索引范围)分组为两个部分:小于某个元素的部分和大于或等于元素的部分。下面是对元素list[pivotIndex]执行分区的伪代码:函数分区( list、left、right、pivotIndex)、pivotValue :=列表交换列表和list // Move (从左到右结束storeIndex := )-1如果list < pivotValue交换列表和列表增量storeIndex交换列表和list // Move到其最终位置返回storeIndex，这称为Lomuto分区方案，它比Hoare的原始分区方案简单但效率低。在快速排序中，我们递归地对两个分支进行排序，从而得到最佳情况O(n log )时间。但是，在进行选择时，我们已经知道了我们想要的元素位于哪个分区中，因为枢轴处于其最后的排序位置，在它前面的所有分区都是以未排序的顺序排列的，所有跟随它的分区都是按未排序的顺序排列的。因此，一个递归调用在正确的分区中定位所需的元素，并在此基础上为left..right inclusive // (即左<= k <=右)返回列表中的k个最小元素: //。//数组中的搜索空间每一轮都在变化，但列表//仍然是相同的大小。因此，不需要每轮更新k。函数选择(列表，左，右，k)如果左=右//如果列表只包含一个元素，则返回列表//返回元素pivotIndex := . //选择左和右之间的pivotIndex，//例如，左+层(rand()%(右左+ 1)) pivotIndex :=分区(列表，左，右)，如果k= pivotIndex返回列表，如果k< pivotIndex返回select( list，left，pivotIndex - 1，k)则返回select(list，pivotIndex + 1，右，k)

请检查编码风格和清晰度。

using System;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace ArrayQuestions
{
    [TestClass]
    public class QuickSelect
    {
        [TestMethod]
        public void QuickSelectAlgoTest()
        {
            int[] arr = {9, 8, 7, 6, 5, 0, 1, 2, 3, 4};
            int[] res = Select(arr,  6);
            for (int i = 0; i < 6; i++)
            {
                Assert.IsTrue(res[i] <= 6, $"The res[i] {res[i]} was not greater than six");
            }
        }

        //Partially sort array such way that elements before index position n are smaller or equal than element at position n
        //And elements after n are larger or equal.
        public int[] Select(int[] input,int n)
        {
            //keep original array
            int[] partiallySortedArray = (int[])input.Clone();
            int startIndex = 0;
            var endIndex = input.Length - 1;
            var pivotIndex = n;
            Random r = new Random();
            while (endIndex > startIndex)
            {
                pivotIndex = QuickSelectPartition(partiallySortedArray, startIndex, endIndex, pivotIndex);
                if (pivotIndex == n)
                {
                    break;
                }
                if (pivotIndex > n)
                {
                    endIndex = pivotIndex - 1;
                }
                else
                {
                    startIndex = pivotIndex + 1;
                }

                pivotIndex = r.Next(startIndex, endIndex);
            }

            return partiallySortedArray;
        }

        private int QuickSelectPartition(int[] array, int startIndex, int endIndex, int pivotIndex)
        {
            int pivotValue = array[pivotIndex];
            Swap(ref array[pivotIndex], ref array[endIndex]);
            for (int i = startIndex; i < endIndex; i++)
            {
                if (array[i].CompareTo(pivotValue) > 0)
                {
                    continue;
                }
                Swap(ref array[i], ref array[startIndex]);
                startIndex++;
            }
            Swap(ref array[endIndex], ref array[startIndex]);
            return startIndex;
        }

        private void Swap(ref int i, ref int i1)
        {
            int temp = i;
            i = i1;
            i1 = temp;
        }
    }
}

sorting

回答 2

Code Review用户

回答已采纳

发布于 2019-06-25 09:08:49

单独的代码和测试。您已经编写了一个单元测试，这是很好的，但是测试代码应该在一个独立的类(通常是一个单独的项目)中，而不是被测试的代码。

测试也可能会更详尽一些。给定相同的输入，您可以测试n的每个值:它仍将在毫秒内运行。单独的测试可以处理重复的元素。第三个测试可以处理重复元素的极端性:如果您有一个长度为100万的数组，其中每个元素都是相同的，并且给出了n = arr.Length / 2，那么性能会出现大幅度下降吗？我怀疑是这样的，这是您可能希望用荷兰标志分区来解决的问题。其他的问题可能是n = -1或n = arr.Length：目前，我不认为这两种情况都处理得特别好。

Select作为一个名称在上下文中是有意义的，但是当您考虑将它嵌入到一个大型项目或库中时，它并不是很具体。命名事情是困难的:经过两分钟的思考，我最好的建议是PartitionAroundMedian。

pivotIndex正在执行双重任务。我认为把这些职责分开比较清楚；因为我收到一条评论说这一点不清楚，因此重构的第一步是

            var pivotIndexIn = n;
            Random r = new Random();
            while (endIndex > startIndex)
            {
                var pivotIndexOut = QuickSelectPartition(partiallySortedArray, startIndex, endIndex, pivotIndexIn);
                if (pivotIndexOut == n)
                {
                    break;
                }
                if (pivotIndexOut > n)
                {
                    endIndex = pivotIndexOut - 1;
                }
                else
                {
                    startIndex = pivotIndexOut + 1;
                }

                pivotIndexIn = r.Next(startIndex, endIndex);
            }

实际上，QuickSelectPartition完全不需要将pivotIndexIn作为一个参数来接收。如果传递它的原因是为了有一个明确的r生命周期，那么可以传递r。

注意，pivotIndexOut的范围缩小到循环体内部。

因为我说的是名字：r糟透了。我对Random实例的约定是将它们称为rnd；其他人可能更喜欢random或rng (用于随机数生成器)。

我发现对数组元素使用原语类型int是不一致的，但是使用CompareTo而不是基元>来比较它们。由于它不需要花费，所以我会更改第一个:泛化快速选择以在T where T : IComparable上操作。

Select不修改其输入的决定可以通过将其从数组更改为IReadOnlyList<> (或者可能是IEnumerable<> )来反映在类型签名中。类似地，返回类型可以是IList<>或IReadOnlyList<>，以指示该方法对其用户的期望。(就我个人而言，我更喜欢IList<>)。

Select、QuickSelectPartition和Swap不使用类的任何实例成员，因此您需要一个很好的理由来不让它们全部为static。事实上，我很想让Select成为一个扩展方法public static IList PartitionAroundMedian(this IEnumerable elements, int n) where T : IComparable。

for (int i = startIndex; i < endIndex; i++) { if (array[i].CompareTo(pivotValue) > 0) { continue; } Swap(ref array[i], ref array[startIndex]); startIndex++; }

有时，快速拒绝和continue会生成可读的代码.我不认为这是其中之一。

            for (int i = startIndex; i < endIndex; i++)
            {
                if (array[i].CompareTo(pivotValue) <= 0)
                {
                    Swap(ref array[i], ref array[startIndex]);
                    startIndex++;
                }
            }

更短，也不需要脑力来逆转病情。

while (endIndex > startIndex) { pivotIndex = QuickSelectPartition(partiallySortedArray, startIndex, endIndex, pivotIndex); if (pivotIndex == n) { break; } ... } return partiallySortedArray;

我会把return放在break的位置，这样就可以开始追逐了。

关于空格的一些小问题:在参数列表中的,后面是否有空格是不一致的。我更喜欢在每个}之后放一个空行，除非下面的行是另一个}或与它直接相关(例如，else与if块直接相关)。

票数 5

Code Review用户

发布于 2019-06-25 05:56:13

您使用的是通常比递归更快的迭代方法。这是一个很好的优化。

if (array[i].CompareTo(pivotValue) > 0) { continue; } Swap(ref array[i], ref array[startIndex]); startIndex++;

为什么不只是：

      if (array[i].CompareTo(pivotValue) < 0)
      {
        Swap(ref array[i], ref array[startIndex]);
        startIndex++;
      }

pivotIndex = r.Next(startIndex, endIndex);

wiki正在这么做，但我不明白为什么要这样做，而不只是采取中间的方式：

pivotIndex = startIndex + (endIndex - startIndex) / 2;

您可以对较大的数据集进行度量，以查看性能是否存在任何平均差异？

$"The res[i] {res[i]} was not greater than six"

严格地说:您不会发现小于6的值，而是在数据集中找到六个最小的值。(n或k是索引，而不是值)

票数 4

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/222889

复制

相似问题

问快速选择C#
EN

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问快速选择C#EN

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问快速选择C#
EN