#include<iostream>
#include<fstream>
#include<time.h>
#include<omp.h>
using namespace std;
static long num_steps = 100;
#define NUM 8
double step;
void main()
{
clock_t time =clock();
ofstream result;
result.open ("Result.txt");
int a[100];
double pi, sum=0.0;
step = 1.0/(double) num_steps;
#pragma omp parallel num_threads(NUM)
{
int i, ID;
double x, psum= 0.0;
int nthreads = omp_get_num_threads();
ID = omp_get_thread_num();
for (i=ID;i<= num_steps; i+=nthreads)
{
x = (i+0.5)*step;
psum += 4.0/(1.0+x*x);
}
#pragma omp critical
sum += psum;
}
pi = step * sum;
for (int j=0;j<100;j++)
result<<a[j]<<endl;
time = clock() - time;
result << "Time Elapsed: " << (((double)time)/CLOCKS_PER_SEC) << endl;
result <<"======================================================================================="<<endl;
result.close();
}问题是:按以下顺序执行以下for (i=ID;i<= num_steps; i+=nthreads)循环: 01234567、01234567、01234567等等。分配的任务是更改for循环,使线程均匀分布,而不是以四舍五入的方式分配。先是零,然后是一,然后是二...那么我该如何改变forloop呢?
发布于 2013-12-13 18:49:18
为此,您必须使用某种线程同步...
你给Visual studio加了标签,所以我假设Windows平台...
最近,这成了我的最爱:
// init
CRITICAL_SECTION hnd;
InitializeCriticalSectionAndSpinCount(&hnd,0x00000400);
// start lock
EnterCriticalSection(&hnd);
// stop lock
LeaveCriticalSection(&hnd);
// exit
DeleteCriticalSection(&hnd);但是还有很多其他的方法。
)< code >H110,我的意思是100%工作的无锁代码在以前的OS-es上是断断续续或冻结的
如果您错误地使用锁,则可能会失去多线程加速的任何好处。
如果你只是担心你的解决方案不能同时计算线程
在你的例子中不是并行的,而是串行的,而不是由以下原因引起的:
- any sheduled task is divided to chunks of time.
- If your task is too short then it is done sooner then the other task even begin execution.
- to test that try bigger payload (compute time > few seconds)
- enlarge number of cycles greatly
- add Sleep(time ms) to have longer computation time
- if the output will be mixed then it was it
- if not then you are still under granularity boundary
- or your multi-thread code is wrong
- are you shore your threads are created/running at the same time ?
- or do you synchronize to something wrong ? (like till the end of previous task)
- also some compilers do a big deal of volatile variables (add locks to it what sometimes do very weird things ... I stumped on it many times but mostly on MCU platforms and Eclipse)
- on some cases if you have just 1 CPU/Core/Computer for processing
- or just setted affinity mask to single CPU
- on some algorithms windows shedulers do not shedule the CPU time evenly
- even regardless the process/thread priority/class
- something similar appears sometimes on Windows 7 even for more CPUs ...
- especially with code mixed with Kernel mode code
要使用粒度,你可以使用他的:
// obtain OS time capabilities
TIMECAPS tim;
timeGetDevCaps(&tim,sizeof(tim));
// set new granularity
if (timeBeginPeriod(time ms)!=TIMERR_NOERROR) log("time granularity out of range");
// return to previous hranularity
timeEndPeriod(time ms ... must be the same as beginperiod);out of range");PS。关于这一点的非常好的东西在这里:
http://bitflipgames.com/2011/05/09/multithreaded-programming-part-1-the-critical-section-lock/ http://bitflipgames.com/2011/05/17/multithreaded-programming-part-2-multiple-readersingle-writer-lock/ http://bitflipgames.com/2011/05/20/multithreaded-programming-part-2-5-mrsw-lock-code/ http://bitflipgames.com/2011/05/25/multithreaded-programming-part-3-going-lockless/
https://stackoverflow.com/questions/15645403
复制相似问题