我试图实现一个函数,为Gauss数值积分法计算权重和脓肿,使用C++AMP并行化进程,并在运行时得到一个DXGI_ERROR_DEVICE_HUNG错误。
这是我的助手方法,用于计算GPU上伽马函数的对数:
template <typename T>
T gammaln_fast( T tArg ) restrict( amp )
{
const T tCoefficients[] = { T( 57.1562356658629235f ), T( -59.5979603554754912f ),
T( 14.1360979747417471f ), T( -0.491913816097620199f ), T( 0.339946499848118887E-4f ),
T( 0.465236289270485756E-4f ), T( -0.983744753048795646E-4f ), T( 0.158088703224912494E-3f ),
T( -0.210264441724104883E-3f ), T( 0.217439618115212643E-3f ), T( -0.164318106536763890E-3f ),
T( 0.844182239838527433E-4f ), T( -0.261908384015814087E-4f ), T( 0.386991826595316234E-5f ) };
T y = tArg, tTemp = tArg + T( 5.2421875f );
tTemp = (tArg + T( 0.5f )) * concurrency::fast_math::log( tTemp ) - tTemp;
T tSer = T( 0.999999999999997092f );
for( std::size_t s = 0; s < (sizeof( tCoefficients ) / sizeof( T )); ++s )
{
tSer += tCoefficients[s] / ++y;
}
return tTemp + concurrency::fast_math::log( T( 2.5066282746310005f ) * tSer / tArg );
}这是我计算重量和脓肿的函数:
template <typename T>
ArrayPair<T> CalculateGaussLaguerreWeights_fast( const T tExponent, const std::size_t sNumPoints, T tEps = std::numeric_limits<T>::epsilon() )
{
static_assert(std::is_floating_point<T>::value, "You can only instantiate this function with a floating point data type");
static_assert(!std::is_same<T, long double>::value, "You can not instantiate this function with long double type"); // The long double type is not currently supported by C++AMP
T tCurrentGuess, tFatherGuess, tGrandFatherGuess;
std::vector<T> vecInitialGuesses( sNumPoints );
for( std::size_t s = 0; s < sNumPoints; ++s )
{
if( s == 0 )
{
tCurrentGuess = (T( 1.0f ) + tExponent) * (T( 3.0f ) + T( 0.92f ) * tExponent) / (T( 1.0f ) + T( 2.4f ) * sNumPoints + T( 1.8f ) * tExponent);
}
else if( s == 1 )
{
tFatherGuess = tCurrentGuess;
tCurrentGuess += (T( 15.0f ) + T( 6.25f ) * tExponent) / (T( 1.0f ) + T( 0.9f ) * tExponent + T( 2.5f ) * sNumPoints);
}
else
{
tGrandFatherGuess = tFatherGuess;
tFatherGuess = tCurrentGuess;
std::size_t sDec = s - 1U;
tCurrentGuess += ((T( 1.0f ) + T( 2.55f ) * sDec) / (T( 1.9f ) * sDec) + T( 1.26f ) * sDec * tExponent
/ (T( 1.0f ) + T( 3.5f ) * sDec)) * (tCurrentGuess - tGrandFatherGuess) / (T( 1.0f ) + T( 0.3f ) * tExponent);
}
vecInitialGuesses[s] = tCurrentGuess;
}
concurrency::array<T> arrWeights( sNumPoints ), arrAbsciasses( sNumPoints, std::begin(vecInitialGuesses) );
try {
concurrency::parallel_for_each( arrAbsciasses.extent, [=, &arrAbsciasses, &arrWeights]( concurrency::index<1> index ) restrict( amp ) {
T tVal = arrAbsciasses[index], tIntermediate;
T tPolynomial1 = T( 1.0f ), tPolynomial2 = T( 0.0f ), tPolynomial3, tDerivative;
std::size_t sIterationNum = 0;
do {
tPolynomial1 = T( 1.0f ), tPolynomial2 = T( 0.0f );
for( std::size_t s = 0; s < sNumPoints; ++s )
{
tPolynomial3 = tPolynomial2;
tPolynomial2 = tPolynomial1;
tPolynomial1 = ((2 * s + 1 + tExponent - tVal) * tPolynomial2 - (s + tExponent) * tPolynomial3) / (s + 1);
}
tDerivative = (sNumPoints * tPolynomial1 - (sNumPoints + tExponent) * tPolynomial2) / tVal;
tIntermediate = tVal;
tVal = tIntermediate - tPolynomial1 / tDerivative;
++sIterationNum;
} while( concurrency::fast_math::fabs( tVal - tIntermediate ) > tEps || sIterationNum < 10 );
arrAbsciasses[index] = tVal;
arrWeights[index] = -concurrency::fast_math::exp( gammaln_fast( tExponent + sNumPoints ) - gammaln_fast( T( sNumPoints ) ) ) / (tDerivative * sNumPoints * tPolynomial2);
} );
}
catch( concurrency::runtime_exception& e )
{
std::cerr << "Runtime error, code: " << e.get_error_code() << "; message: " << e.what() << std::endl;
}
return std::make_pair( std::move( arrAbsciasses ), std::move( arrWeights ) );
}下面是调试控制台的完整跟踪:
D3D11:移除装置。D3D11错误: ID3D11Device::RemoveDevice:设备删除是由于以下原因触发的(DXGI_ERROR_DEVICE_HUNG:设备执行命令花费了不合理的时间,或者硬件崩溃/挂起)。因此,触发了TDR (超时值检测和恢复)机制。挂起时,当前设备上下文正在执行命令。应用程序可能希望重新出现并退回到不太积极地使用显示硬件)。执行错误#378: DEVICE_REMOVAL_PROCESS_AT_FAULT D3D11错误: ID3D11DeviceContext::Map:返回DXGI_ERROR_DEVICE_REMOVED,当资源试图使用READ或READWRITE映射时。RESOURCE_MANIPULATION错误#2097214: RESOURCE_MAP_DEVICEREMOVED_RETURN
我很抱歉未能举出一个可复制的小例子;我希望这仍然是一个可以接受的问题,因为我无法自己解决这个问题。
发布于 2014-08-19 23:17:15
在使用DirectCompute时,主要的挑战是编写不与Direct3D自动“GPU挂起”检测超时冲突的计算。默认情况下,系统假设如果一个着色器花费超过几秒钟,GPU实际上是挂起的。这种启发式方法适用于视觉着色器,但是您可以很容易地创建一个DirectCompute着色器,它需要很长时间才能完成。
解决方案是禁用超时检测。您可以通过使用Direct3D创建D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT 11设备来做到这一点,参见在Windows 8上禁用C++ AMP算法的TDR博客文章。主要要记住的是,D3D11_CREATE_DEVICE_DISABLE_GPU_TIMEOUT需要Windows11.1或更高版本的运行时,该运行时包含在Windows8.x中,可以用KB2670838安装在Windows7ServicePack 1上。有关使用DirectX 11.1和Windows 7、DirectX 11.1和Windows7更新和MSDN的一些注意事项,请参阅KB2670838。
https://stackoverflow.com/questions/25386652
复制相似问题