在阅读了这个how-can-a-fortran-openacc-routine-call-another-fortran-openacc-routine之后,我仍然对这个OpenACC函数调用的限制感到困惑。
下面是来自上面链接帖子的修改后的无意义代码:
PROGRAM Test
IMPLICIT NONE
CONTAINS
SUBROUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0, N
call InnerRoutine( y )
ENDDO
END SUBROUTINE OuterRoutine
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
END PROGRAM Test当我用nvfortran版本20.7编译它时,我得到了
$ nvfortran -acc -Minfo routine.f90
outerroutine:
14, Generating acc routine seq
Generating Tesla code
22, Reference argument passing prevents parallelization: y
innerroutine:
27, Generating acc routine seq
Generating Tesla code
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccr22eZDXceweL.gpu (43, 14): parse invalid forward reference to function '_innerroutine_' with wrong type!
ptxas /tmp/pgaccH22eJTMb0hKD.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccH22eJTMb0hKD.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVFORTRAN-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (routine_inline.f90: 1)
0 inform, 0 warnings, 1 severes, 0 fatal for是什么触发了编译错误?作为比较,下面的代码使用acc函数调用
module data
integer, parameter :: maxl = 100000
real, dimension(maxl) :: xstat
real, dimension(:), allocatable :: yalloc
!$acc declare create(xstat,yalloc)
logical :: IsUsed
!$acc declare create(IsUsed)
end module
module useit
use data
contains
subroutine compute(n)
integer :: n
integer :: i
!$acc parallel loop present(yalloc,xstat)
do i = 1, n
call iprocess(i, yalloc)
enddo
end subroutine
subroutine iprocess(i, yalloc)
!$acc routine seq
integer :: i
real,intent(out) :: yalloc(:)
if(IsUsed) call kernel(i,yalloc)
contains
subroutine kernel(i,yalloc)
!$acc routine seq
integer, intent(in) :: i
real,intent(out) :: yalloc(:)
yalloc(i) = 2*xstat(i)
end subroutine
end subroutine
end module
program main
use data
use useit
implicit none
integer :: nSize = 100
!---------------------------------------------------------------------------
call allocit(nSize)
call initialize
call compute(nSize)
!$acc update self(yalloc)
write(*,*) "yalloc(10)=",yalloc(10) ! 3
call finalize
contains
subroutine allocit(n)
integer :: n
allocate(yalloc(n))
end subroutine allocit
subroutine initialize
xstat = 1.0
yalloc = 1.0
IsUsed = .true.
!$acc update device(xstat,yalloc,IsUsed)
end subroutine initialize
subroutine finalize
deallocate(yalloc)
end subroutine finalize
end program main可以用OpenACC编译并运行。
更新:令人惊讶的是,对于第一段代码,当我简单地切换子例程的顺序时,它可以工作:
PROGRAM Test
IMPLICIT NONE
CONTAINS
subroutine InnerRoutine( y )
!$acc routine
IMPLICIT NONE
real :: y
END subroutine InnerRoutine
SUBROUTINE OuterRoutine( N )
!$acc routine
IMPLICIT NONE
INTEGER :: N
real :: y
INTEGER :: i
DO i = 0, N
call InnerRoutine( y )
ENDDO
END SUBROUTINE OuterRoutine
END PROGRAM Test这对我来说似乎真的很令人惊讶,这一点依赖于例程排序。但是为什么它适用于我上面的第二个例子呢?
发布于 2020-09-29 23:33:27
这是一个编译器设备代码生成错误。当从"OuterRoutine“调用"InnerRoutine”时,编译器正确地将隐藏参数添加到父堆栈中,但"InnerRoutine“的定义缺少它作为实际参数。错误是被调用者和调用者之间不匹配。
我添加了一个问题报告,TPR #29057。不清楚这是一个更广泛的问题,还是小测试用例的产物。
注意,请注意使用包含的设备子例程。Fortran允许通过传入指向父级堆栈的指针来访问父级的局部变量。如果父进程在主机上,子进程在设备上,则直接访问父进程的变量将导致运行时错误。例如,如果"iprocess“包含在"compute”中,而您直接访问"i“,而不是将其作为参数传递,则会收到错误,因为设备无法访问主机的堆栈。
https://stackoverflow.com/questions/64112821
复制相似问题