fortran openacc派生类型,可分配
我读到,手动深度复制Fortran派生类型是可能的,但以下简单测试程序在运行时失败;使用PGI v16.10干净地编译的程序。我做错了什么fortran openacc派生类型,可分配,fortran,gpu,openacc,pgi,Fortran,Gpu,Openacc,Pgi,我读到,手动深度复制Fortran派生类型是可能的,但以下简单测试程序在运行时失败;使用PGI v16.10干净地编译的程序。我做错了什么 program Test implicit none type dt integer :: n real, dimension(:), allocatable :: xm end type dt type(dt) :: grid integer :: i grid%n =
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data pcreate(grid%xm)
!$acc kernels
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
print*,grid%xm
end program Test
我得到的错误是:
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
您只需要在kernels指令中添加一个“present(grid)”子句
下面是一个程序示例,其中包含修复程序以及一些其他内容,如更新数据以便在主机上打印
% cat test.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc enter data create(grid%xm)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid%xm)
print*,grid%xm
!$acc exit data delete(grid%xm, grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test.f90 -Minfo=accel -ta=tesla -V16.10; a.out
test:
16, Generating enter data copyin(grid)
17, Generating enter data create(grid%xm(:))
18, Generating present(grid)
19, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
19, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
23, Generating update self(grid%xm(:))
1.000000 4.000000 9.000000 16.00000
25.00000 36.00000 49.00000 64.00000
81.00000 100.0000
请注意,PGI 17.7将在Fortran中包含测试版支持true deep copy。与上面的手动深度复制不同。下面是一个使用true deep copy的示例:
% cat test_deep.f90
program Test
implicit none
type dt
integer :: n
real, dimension(:), allocatable :: xm
end type dt
type(dt) :: grid
integer :: i
grid%n = 10
allocate(grid%xm(grid%n))
!$acc enter data copyin(grid)
!$acc kernels present(grid)
do i = 1, grid%n
grid%xm(i) = i * i
enddo
!$acc end kernels
!$acc update host(grid)
print*,grid%xm
!$acc exit data delete(grid)
deallocate(grid%xm)
end program Test
% pgf90 -acc test_deep.f90 -Minfo=accel -ta=tesla:deepcopy -V17.7 ; a.out
test:
16, Generating enter data copyin(grid)
17, Generating present(grid)
18, Loop is parallelizable
Accelerator kernel generated
Generating Tesla code
18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
22, Generating update self(grid)
1.000000 4.000000 9.000000 16.00000
25.00000 36.00000 49.00000 64.00000
81.00000 100.0000
根据文档(PGI openacc指南,v2015和v2017):派生类型的数组(其中派生类型包含可分配成员)尚未测试,因此不应认为此版本支持该数组。事实证明,注释掉pcreate(网格%xm)的创建可以使程序正常运行。这是否意味着现在支持深度复制?“未经测试,不应被视为受支持”…用于阵列的位。你只有一个变量,所以我没有“我不知道,试着在手册中搜索。Mat,非常感谢,我将使用deep copy。为什么即使在v17.7中,现在的条款也是必要的?当使用一个没有data子句的内核区域时,我希望会发生网格的隐式copyin/present,这难道不应该同样有效吗?再次感谢!我认为这是一个错误,因为你是正确的,深拷贝隐式副本应该只是工作。授权的深度复制是一项全新的测试版功能,因此这些类型的问题并不意外。