问OMP线程比内核多，性能仍然相同
EN

Stack Overflow用户

提问于 2016-11-25 15:16:29

回答 1查看 395关注 0票数 0

我有一部分用OpenMP编写的串行程序。当我用8个线程(我的电脑可以使用8个线程)时，当我用16、32或64等线程时，它会做同样的事情，这正常吗？我想，当我创建更多的线程而不是核心时，程序将是缓慢的。如果你想检查的话，这就是代码。它向右跑！在主文件中，即在其他文件中，有线程的集合num。

       void truncated_radix_sort(unsigned long int *morton_codes,
          unsigned long int *sorted_morton_codes,
          unsigned int *permutation_vector,
          unsigned int *index,
          int *level_record,
          int N,
          int population_threshold,
          int sft, int lv){

int BinSizes[MAXBINS] = {0};
unsigned int *tmp_ptr;
unsigned long int *tmp_code;

//thread management
extern int NUM_THREADS;
extern int activeThreads;
int startNewThreads = 0;
//if there's space for new threads, set flag to 1 and add the new threads to the count
//once calling is over, decrement count

level_record[0] = lv; // record the level of the node

if(N<=population_threshold || sft < 0) { // Base case. The node is a leaf
    memcpy(permutation_vector, index, N*sizeof(unsigned int)); // Copy the pernutation vector
    memcpy(sorted_morton_codes, morton_codes, N*sizeof(unsigned long int)); // Copy the Morton codes

    return;
}
else{

    // Find which child each point belongs to
    int j = 0;
    for(j=0; j<N; j++){
        unsigned int ii = (morton_codes[j]>>sft) & 0x07;
        BinSizes[ii]++;
    }


    // scan prefix (must change this code)
    int offset = 0, i = 0;
    for(i=0; i<MAXBINS; i++){
        int ss = BinSizes[i];
        BinSizes[i] = offset;
        offset += ss;
    }

    for(j=0; j<N; j++){
        unsigned int ii = (morton_codes[j]>>sft) & 0x07;
        permutation_vector[BinSizes[ii]] = index[j];
        sorted_morton_codes[BinSizes[ii]] = morton_codes[j];
        BinSizes[ii]++;
    }

    //swap the index pointers
    swap(&index, &permutation_vector);

    //swap the code pointers
    swap_long(&morton_codes, &sorted_morton_codes);
    int offsets[MAXBINS];
    offset = 0;
    offsets[0] = 0;
    for(i = 0; i<MAXBINS-1; i++) {
        int size = BinSizes[i] - offset;
        offset +=size;
        offsets[i+1] = offset;
    }

    #pragma omp flush(activeThreads)
    //Allow creation of new threads? Only if the number has not been exceeded
    if (activeThreads < NUM_THREADS && 0 == startNewThreads){
        startNewThreads = 1; //allow creation of more threads
    }
    if (activeThreads > NUM_THREADS && 1 == startNewThreads){
        startNewThreads = 0; //stop creating more threads
    }


    #pragma omp flush(startNewThreads)
    omp_set_nested(startNewThreads);
    /* Call the function recursively to split the lower levels */
    #pragma omp parallel num_threads(NUM_THREADS)
    {
        #pragma omp for private(i) nowait\
        schedule(static)
        for(i=0; i<MAXBINS; i++){
            if (omp_get_nested()){
                #pragma omp atomic
                activeThreads ++; //account for new thread
                #pragma omp flush(activeThreads)
            }
            truncated_radix_sort(&morton_codes[offsets[i]],
                    &sorted_morton_codes[offsets[i]],
                    &permutation_vector[offsets[i]],
                    &index[offsets[i]], &level_record[offsets[i]],
                    sizes[i],
                    population_threshold,
                    sft-3, lv+1);
            if(omp_get_nested()){
                #pragma omp atomic
                activeThreads--;  //thread about to terminate
                #pragma omp flush(activeThreads)
            }
        }
    }
}

}

multithreading

openmp

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-11-25 18:24:08

你的实验与理论相符。您可能想要阅读有关Amdahl定律的内容。基本上，根据这个定律，您将有大约相同的性能与较低数量的线程。在现实生活中，它会在某个时候开始减少(在这里你有太多的线程)。你可以观察到，如果你有成千上万的线程。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40808018

复制

相似问题

问OMP线程比内核多，性能仍然相同
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问OMP线程比内核多，性能仍然相同EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问OMP线程比内核多，性能仍然相同
EN