文章/答案/技术大牛

发布

社区首页 >问答首页 >merge_sort的实现--比较数组与矢量的时序

问merge_sort的实现--比较数组与矢量的时序
EN

Code Review用户

提问于 2015-03-30 14:55:06

回答 1查看 347关注 0票数 3

我正在阅读“算法入门”一书。我对整数数组的合并排序和向量进行了比较。

我能更好地组织这个程序吗？为什么矢量版本要慢得多呢？用向量类型对200万个整数进行排序几乎需要2秒，但是使用数组对相同的列表进行排序只需要.4秒。另外，如果我将arraylength增加到300多万，那么我就会得到一个分割错误。我怎么才能避免这种情况？

我习惯了数学和Python，但不习惯C++。我在这里以什么方式使用了指针？我怎么才能更好地利用它们呢？

#include <iostream>
#include <math.h>
#include <vector>
#include <chrono>
using namespace std;

const int arraylength=2000000;

//This is an implementation of merge_sort, an algorithm to sort a list of integers using
//a recursion relation.  The merge_sort is written as two functions, `merge` which takes two
//pre-sorted lists and merges them to a single sorted list.  This is called on by merge_sort, 
//which also recursively calls itself.

//I've implemented it here twice, first with the two functions `merge` and `merge_sort`, and then
//again with `vmerge` and `vmerge_sort`.  The first two take as their argument arrays of integers, 
//while the second two take the data type `vector` from the `vector` package (is package the right word?
//or do I say library?).  


void merge(int A[], int p, int q, int r)
{
    //n1 and n2 are the lengths of the pre-sorted sublists, A[p..q] and A[q+1..r]
    int n1=q-p+1;
    int n2=r-q;
    //copy these pre-sorted lists to L and R
    int L[n1+1];
    int R[n2+1];
    for(int i=0;i<=n1-1; i++)
    {
        L[i]=A[p+i];
    }
    for(int j=0;j<=n2-1; j++)
    {
        R[j]=A[q+1+j];
    }


    //Create a sentinal value for L and R that is larger than the largest
    //element of A
    int largest;
    if(L[n1-1]<R[n2-1]) largest=R[n2-1]; else largest=L[n1-1];
    L[n1]=largest+1;
    R[n2]=largest+1;

    //Merge the L and R lists
    int i=0;
    int j=0;
    for(int k=p; k<=r; k++)
    {
        if (L[i]<=R[j])
        {
            A[k]=L[i];
            i++;
        } else
        {
            A[k]=R[j];
            j++;
        }
    }
}

void merge_sort(int A[], int p, int r)
{
    if(p<r)
    {
        int q=floor((p+r)/2);
        merge_sort(A,p,q);
        merge_sort(A,q+1,r);
        merge(A,p,q,r);
    }

}


void vmerge(vector<int>& A, int p, int q, int r)
{
    //n1 and n2 are the lengths of the pre-sorted sublists, A[p..q] and A[q+1..r]
    int n1=q-p+1;
    int n2=r-q;
    //copy these pre-sorted lists to L and R

    vector<int> L(&A[p],&A[q+1]);
    vector<int> R(&A[q+1],&A[r+1]);


    //Create a sentinal value for L and R that is larger than the largest
    //element of A
    int largest;
    if(L[n1-1]<R[n2-1]) largest=R[n2-1]; else largest=L[n1-1];
    L.push_back(largest+1);
    R.push_back(largest+1);

    //Merge the L and R lists
    int i=0;
    int j=0;
    for(int k=p; k<=r; k++)
    {
        if (L[i]<=R[j])
        {
            A[k]=L[i];
            i++;
        } else
        {
            A[k]=R[j];
            j++;
        }
    }
}


void vmerge_sort(vector<int>& A, int p, int r)
{
    //This recursively splits the vector A into smaller sections 
    if(p<r)
    {
        int q=floor((p+r)/2);
        vmerge_sort(A,p,q);
        vmerge_sort(A,q+1,r);
        vmerge(A,p,q,r);
    }

}    

int main()
{
    //seed the random number generator
    srand(time(0));

    cout<<"C++ merge-sort test"<<endl;
    //vlist is defined to be of type vector<int>
    vector<int> vlist1;
    //rlist1 is defined to be an integer array
    int *rlist1= new int[arraylength];
    //both vlist1 and rlist1 have the same content, 2 million random integers
    for(int i=0;i<=arraylength-1;i++)
    {
        rlist1[i] = rand() % 10000;
        vlist1.push_back(rlist1[i] );
    }

    //here I sort rlist1
    auto   t1 = std::chrono::high_resolution_clock::now();
    merge_sort(rlist1,0,arraylength-1);
    auto   t2 = std::chrono::high_resolution_clock::now();
    cout << "sorting "<<arraylength<<" random numbers with merge sort took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";

    //here I sort vlist1          
    t1 = std::chrono::high_resolution_clock::now();
    vmerge_sort(vlist1,0,arraylength-1);
    t2 = std::chrono::high_resolution_clock::now();
    cout << "sorting "<<arraylength<<" random numbers with vmerge sort took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";


}

更新:这是我阅读了Loki Astari和Aleksey Demakov的答案后得到的代码。使用上面的代码，我能够使用merge_sort在400毫秒内对200万个随机数进行排序，使用vmerge_sort对1926年随机数进行排序。修改后，这些函数分别在410 ms和860 ms内完成任务。因此，使用vector类型需要两倍的时间。我认为这不应该是一个意外，因为它声明为这里“因此，与数组相比，向量消耗更多的内存，以换取高效管理存储和动态增长的能力。”

#include <iostream>
#include <math.h>
#include <vector>
#include <chrono>

//Is this less offensive than using the entire std namespace?
using std::cout;
using std::endl;

const int arraylength=2000000;

//This is an implementation of merge_sort, an algorithm to sort a list of integers using
//a recursion relation.  The merge_sort is written as two functions, `merge` which takes two
//pre-sorted lists and merges them to a single sorted list.  This is called on by merge_sort, 
//which also recursively calls itself.

//I've implemented it here twice, first with the two functions `merge` and `merge_sort`, and then
//again with `vmerge` and `vmerge_sort`.  The first two take as their argument arrays of integers, 
//while the second two take the data type `vector` from the `vector` package (is package the right word?
//or do I say library?).  


void merge(int A[], int LA[], int RA[], int p, int q, int r)
{
    //n1 and n2 are the lengths of the pre-sorted sublists, A[p..q] and A[q+1..r]
    int n1=q-p+1;
    int n2=r-q;
    //Copy the left and right halves of the A array into the L and R arrays
    for(int i=0;i<n1; i++)
    {
        LA[i]=A[p+i];
    }
    for(int j=0;j<n2; j++)
    {
        RA[j]=A[q+1+j];
    }


    //Merge the L and R lists
    int i=0;
    int j=0;
    int k = p;
    while(i < n1 && j < n2) {
        A[k++] = (LA[i]<=RA[j])  
                   ? LA[i++]    
                   : RA[j++];
    }
    while(i < n1) {
        A[k++] = LA[i++];
    }
    while(j < n2) {
        A[k++] = RA[j++];
    }
}

void merge_sort(int A[], int LA[], int RA[], int p, int r)
{
    //This recursively splits the array A into smaller sections 
    if(p<r)
    {
        int q=floor((p+r)/2);
        merge_sort(A,LA,RA,p,q);
        merge_sort(A,LA,RA,q+1,r);
        merge(A,LA,RA,p,q,r);
    }

}


void vmerge(std::vector<int>& A, std::vector<int>& LA, std::vector<int>& RA, int p, int q, int r)
{
    //n1 and n2 are the lengths of the pre-sorted sublists, A[p..q] and A[q+1..r]
    int n1=q-p+1;
    int n2=r-q;
    //copy these pre-sorted lists to L and R

    for(int i=0;i<n1; i++)
    {
        LA[i]=A[p+i];
    }
    for(int j=0;j<n2; j++)
    {
        RA[j]=A[q+1+j];
    }


    //Merge the L and R lists
    int i=0;
    int j=0;
    int k = p;
    while(i < n1 && j < n2) 
    {
        A[k++] = (LA[i]<=RA[j])  
                   ? LA[i++]    
                   : RA[j++];
    }
    while(i < n1) {
        A[k++] = LA[i++];
    }
    while(j < n2) {
        A[k++] = RA[j++];
    }


}


void vmerge_sort(std::vector<int>& A, std::vector<int>& LA, std::vector<int>& RA, int p, int r)
{
    //This recursively splits the vector A into smaller sections 
    if(p<r)
    {
        int q=floor((p+r)/2);
        vmerge_sort(A,LA,RA,p,q);
        vmerge_sort(A,LA,RA,q+1,r);
        vmerge(A,LA,RA,p,q,r);
    }

}    

int main()
{
    //seed the random number generator
    srand(time(0));
    std::chrono::high_resolution_clock::time_point t1,t2;
    cout<<"C++ merge-sort test"<<endl;


    //rlist1 is defined to be an integer array
    //L and R are the subarrays used in the merge function
    int *rlist1= new int[arraylength];
    int halfarraylength=ceil(arraylength/2)+1;
    int *R= new int[halfarraylength];
    int *L= new int[halfarraylength];


    //vlist is defined to be of type vector<int>
    //vL and vR are the left and right subvectors used in the vmerge function
    std::vector<int> vlist1,vL,vR;
    vlist1.reserve(arraylength);
    vL.reserve(halfarraylength);
    vR.reserve(halfarraylength);



    //both vlist1 and rlist1 have the same content, 2 million random integers
    for(int i=0;i<=arraylength-1;i++)
    {
        rlist1[i] = rand() % 1000000;
        vlist1[i] = rlist1[i];
    }


    //here I sort rlist1
    t1 = std::chrono::high_resolution_clock::now();
    merge_sort(rlist1,L,R,0,arraylength-1);
    t2 = std::chrono::high_resolution_clock::now();
    cout << "sorting "<<arraylength<<" random numbers with merge sort took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";



    //here I sort vlist1          
    t1 = std::chrono::high_resolution_clock::now();
    vmerge_sort(vlist1,vL,vR,0,arraylength-1);
    t2 = std::chrono::high_resolution_clock::now();
    cout << "sorting "<<arraylength<<" random numbers with vmerge sort took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";

    //Now we test that both sorted lists are identical
    cout << "Testing that both sorted lists are the same"<< endl;
    int testcounter = 0;
    for (int k=0; k< arraylength; k++)
    {
        if (rlist1[k] != vlist1[k]) testcounter+=1;
    }
    if (testcounter==0) cout<< "Both lists are the same\n"; else cout<<"Both lists are not the same\n";




}

这两个答案都很有帮助。在这个stackexchange上接受一个答案是如何工作的，因为您不是专门问一个问题，而是征求关于如何改进一些东西的评论。

c++

mergesort

回答 1

Code Review用户

发布于 2015-03-30 20:51:46

在数组版本中，在堆栈上分配数组。如果数组太大，可能会出现堆栈溢出。

在C++矢量版本中，std::vector在空闲存储上分配空间。所以你可能会得到关于log( arraylength ) *arraylength向量分配的信息。此外，对这两个向量都执行push_back，这可能是分配数量的两倍。

对于这两个版本，我建议在main()函数中预先分配所需的额外内存，并将其作为参数传递给合并函数。

对于C++向量，您需要调用reserve()方法，以便从一开始就包含足够的空间，而不需要重新分配它。

更新:我将合并排序实现仅用于向量，数组的版本可以使用类似的技术完成。

#include <algorithm>
#include <limits>
#include <stdexcept>
#include <vector>

void vmerge(std::vector<int> &a,
            int p, int q, int r,
            std::vector<int> &aux1,
            std::vector<int> &aux2) {
  aux1.clear();
  aux2.clear();
  aux1.insert(aux1.begin(), &a[p], &a[q]);
  aux2.insert(aux2.begin(), &a[q], &a[r]);

  int max = std::max(aux1.back(), aux2.back());
  if (max == std::numeric_limits<int>::max())
    throw std::out_of_range("This version of merge algorithm cannot handle INT MAX value");
  aux1.push_back(max + 1);
  aux2.push_back(max + 1);

  int i1 = 0, i2 = 0;
  for (int k = p; k < r; k++) {
    if (aux1[i1] <= aux2[i2])
      a[k] = aux1[i1++];
    else
      a[k] = aux2[i2++];
  }
}

void vmerge_sort_aux(std::vector<int> &a,
                     int p, int r,
                     std::vector<int> &aux1,
                     std::vector<int> &aux2) {
  int n = r - p;
  if (n > 1) {
    int q = p + n / 2;
    vmerge_sort_aux(a, p, q, aux1, aux2);
    vmerge_sort_aux(a, q, r, aux1, aux2);
    vmerge(a, p, q, r, aux1, aux2);
  }
}

void vmerge_sort(std::vector<int> &a) {
  if (a.size() > 1) {
    std::vector<int> aux1;
    std::vector<int> aux2;
    aux1.reserve(a.size() / 2 + 1);
    aux2.reserve(a.size() - (a.size() / 2) + 1);
    vmerge_sort_aux(a, 0, a.size(), aux1, aux2);
  }
}

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/85409

复制

相似问题

问merge_sort的实现--比较数组与矢量的时序
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问merge_sort的实现--比较数组与矢量的时序EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问merge_sort的实现--比较数组与矢量的时序
EN