Cuda 推力变换的ptx核检测_Cuda_Thrust_Ptx - Fatal编程技术网

Cuda 推力变换的ptx核检测

cuda

Cuda 推力变换的ptx核检测,cuda,thrust,ptx,Cuda,Thrust,Ptx,我有以下推力：：转换调用 my_functor *f_1 = new my_functor(); thrust::transform(data.begin(), data.end(), data.begin(),*f_1); 我想在PTX文件中检测它对应的内核。但是有许多内核在其损坏的名称中包含my_函子比如说- _ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for

我有以下推力：：转换调用

my_functor *f_1 = new my_functor();
thrust::transform(data.begin(), data.end(), data.begin(),*f_1);

我想在PTX文件中检测它对应的内核。但是有许多内核在其损坏的名称中包含my_函子

比如说-

_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for_each_n_closureINS_12zip_iteratorINS_5tupleINS_6detail15normal_iteratorINS_10device_ptrIiEEEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEEjNS9_30device_unary_transform_functorI10my_functorEENS3_20blocked_thread_arrayEEEEEvT_

_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for_each_n_closureINS_12zip_iteratorINS_5tupleINS_6detail15normal_iteratorINS_10device_ptrIiEEEESD_NS_9null_typeESE_SE_SE_SE_SE_SE_SE_EEEElNS9_30device_unary_transform_functorI10my_functorEENS3_20blocked_thread_arrayEEEEEvT_

_ZN6thrust6detail15device_functionINS0_30device_unary_transform_functorI10my_functorEEvEC1ERKS4_

启动了哪个内核以及这些其他内核是什么？

如果您使用的是Visual Studio，请使用CUDA工具包附带的Nvidia NSIGHT Visual Studio Edition

转到“Nsight”菜单，单击“开始性能分析…”条目

在“活动类型”中，选择“配置CUDA应用程序”
在“实验设置”中，勾选“为CUDA源代码视图收集信息”
在“要运行的实验”列表框中选择“全部”
在“捕获控制”中，勾选“停止时打开报告”，并在列表框中选择“CUDA源代码视图”

然后，单击“启动”，等待应用程序完全执行。您将在控制台中看到来自Nsight的其他输出

执行后，“CUDA源代码视图”窗口将打开。 -在“查看”列表框中选择“源和PTX” 您将能够找到源代码和生成的PTX之间的对应关系。

单击源代码中的一行时，PTX代码中的一行或多行将以绿色突出显示。

如果您使用的是Visual Studio，请使用CUDA工具包附带的Nvidia NSIGHT Visual Studio Edition

转到“Nsight”菜单，单击“开始性能分析…”条目

在“活动类型”中，选择“配置CUDA应用程序”
在“实验设置”中，勾选“为CUDA源代码视图收集信息”
在“要运行的实验”列表框中选择“全部”
在“捕获控制”中，勾选“停止时打开报告”，并在列表框中选择“CUDA源代码视图”

然后，单击“启动”，等待应用程序完全执行。您将在控制台中看到来自Nsight的其他输出

执行后，“CUDA源代码视图”窗口将打开。 -在“查看”列表框中选择“源和PTX” 您将能够找到源代码和生成的PTX之间的对应关系。

单击源代码中的一行时，PTX代码中的一行或多行将以绿色突出显示。

使用

c++filt

命令。当我通过c++过滤器传递您的示例内核名时，我得到

void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)

void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)

thrust::detail::device_function<thrust::detail::device_unary_transform_functor<my_functor>, void>::device_function(thrust::detail::device_unary_transform_functor<my_functor> const&)

使用

c++filt

命令。当我通过c++过滤器传递您的示例内核名时，我得到

void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)

void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)

thrust::detail::device_function<thrust::detail::device_unary_transform_functor<my_functor>, void>::device_function(thrust::detail::device_unary_transform_functor<my_functor> const&)

为什么你认为它只会启动一个内核？@Drop我认为为简单算术启动多个内核会很低效，并且其中一些内核中的指令非常相似。为什么你认为它只会启动一个内核？@Drop我认为为简单算术启动多个内核会很低效，并且有些内核中的指令非常相似这些果仁的品质非常相似

[ios7]相关文章推荐

随机文章推荐