首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Multi2sim v4.0.1上的简单OpenMP程序的奇怪输出

Multi2sim v4.0.1上的简单OpenMP程序的奇怪输出
EN

Stack Overflow用户
提问于 2013-05-09 01:41:51
回答 1查看 618关注 0票数 0

我正在尝试使用OpenMP运行一个简单的程序

该程序如下所示

代码语言:javascript
复制
#include <iostream>
#include <fstream>
#include <vector>
#include <omp.h>
#include <algorithm>
#include <math.h>
#include <map>
#include <string>
#include <ctime>
using namespace std;

#define NUM 10



void openMP()
{
    omp_set_num_threads(1);
    int sum =0;
    #pragma omp parallel for shared(sum)
    {
        for (int i=0;i<100;i++)
        {
            sum++;
        }
    }
    cout<<"sum = "<<sum<<endl;

}
int main()
{
    cout<<"Open MP \n";
    openMP();
return 0;
}

现在,当我使用以下命令编译它时

g++ test.cpp -fopenmp -o test

并在ubuntu终端上运行

./test

输出是正确的-我认为-如下所示

代码语言:javascript
复制
Open MP 
sum = 100

但是,当我尝试使用这两个文件通过Multi2sim运行它时,我的讲师给了我

多核配置:

代码语言:javascript
复制
[ General ]
Cores = 4
Threads = 1

多核内存配置:

代码语言:javascript
复制
[CacheGeometry geo-l1]
Sets = 256
Assoc = 2
BlockSize = 64
Latency = 2
Policy = LRU
Ports = 2

[CacheGeometry geo-l2]
Sets = 512
Assoc = 4
BlockSize = 64
Latency = 20
Policy = LRU
Ports = 4

[Module mod-l1-0]
Type = Cache
Geometry = geo-l1
LowNetwork = net-l1-l2 
LowModules = mod-l2

[Module mod-l1-1]
Type = Cache
Geometry = geo-l1
LowNetwork = net-l1-l2 
LowModules = mod-l2

[Module mod-l2]
Type = Cache
Geometry = geo-l2
HighNetwork = net-l1-l2 
LowNetwork = net-l2-mm
LowModules = mod-mm

[Module mod-mm]
Type = MainMemory
BlockSize = 256
Latency = 200
HighNetwork = net-l2-mm

[Network net-l2-mm]
DefaultInputBufferSize = 1024 
DefaultOutputBufferSize = 1024
DefaultBandwidth = 256 

[Network net-l1-l2]
DefaultInputBufferSize = 1024 
DefaultOutputBufferSize = 1024
DefaultBandwidth = 256 

[Entry core-0]
Arch = x86
Core = 0
Thread = 0
DataModule = mod-l1-0
InstModule = mod-l1-0

[Entry core-1]
Arch = x86
Core = 1
Thread = 0
DataModule = mod-l1-0
InstModule = mod-l1-0

[Entry core-2]
Arch = x86
Core = 2
Thread = 0
DataModule = mod-l1-0
InstModule = mod-l1-0

[Entry core-3]
Arch = x86
Core = 3
Thread = 0
DataModule = mod-l1-0
InstModule = mod-l1-0

然后在Ubuntu终端中使用这个指令

代码语言:javascript
复制
m2s --x86-config multicore-config.txt --mem-config multicore-mem-config.txt --x86-sim detailed test

我得到了输出

代码语言:javascript
复制
; Multi2Sim 4.0.1 - A Simulation Framework for CPU-GPU Heterogeneous Computing
; Please use command 'm2s --help' for a list of command-line options.
; Last compilation: May  8 2013 10:01:31

Open MP 
sum = 83

;
; Simulation Statistics Summary
;

[ General ]
Time = 53.17
SimEnd = ContextsFinished
Cycles = 3691870

[ x86 ]
SimType = Detailed
Time = 53.15
Contexts = 4
Memory = 37056512
EmulatedInstructions = 3292450
EmulatedInstructionsPerSecond = 61943
Cycles = 3691558
CyclesPerSecond = 69452
FastForwardInstructions = 0
CommittedInstructions = 2081157
CommittedInstructionsPerCycle = 0.5638
CommittedMicroInstructions = 3113721
CommittedMicroInstructionsPerCycle = 0.8435
BranchPredictionAccuracy = 0.9375

为什么在正常运行中的输出是100,而在Multi2sim 83中的输出是100

另外,为什么在Multi2Sim上运行要花这么多时间?

任何帮助都将不胜感激。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-05-09 02:56:31

我不太了解m2s,但罪魁祸首可能是:

代码语言:javascript
复制
#pragma omp parallel for shared(sum)
    {
        for (int i=0;i<100;i++)
        {
            sum++; // Concurrent access to a shared variable!!!
        }
    }

在第一个测试中,您显式地将线程数设置为1

代码语言:javascript
复制
omp_set_num_threads(1);

将您从竞争条件中拯救出来。我建议尝试一下:

代码语言:javascript
复制
#pragma omp parallel for shared(sum) reduction(+:sum)
for (int i=0;i<100;i++) {
            sum++;
}    

以查看是否可以获得所需的行为。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/16446956

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档