为什么Python'；s cProfile报告的运行时间与使用Pytork时使用的time.time（）增量不同？_Python_Pytorch_Python 3.6_Ubuntu 18.04

为什么Python'；s cProfile报告的运行时间与使用Pytork时使用的time.time（）增量不同？

python pytorch

为什么Python'；s cProfile报告的运行时间与使用Pytork时使用的time.time（）增量不同？,python,pytorch,python-3.6,ubuntu-18.04,Python,Pytorch,Python 3.6,Ubuntu 18.04,我正在使用PyTorch分析一些代码。我知道CUDA通常有一些异步执行（请参阅），但我相信从GPU到CPU的传输通常会强制同步出于这个原因，我决定天真地使用cProfile，但我注意到profile.enable（）报告的时间。。。Profile.disable（）与跨time.time（）（作为增量）记录的时间不同下面是代码在高层的外观： gpu=torch.device（“cuda”） cpu=火炬装置（“cpu”） setup=setup（） net=make\u fcn\u resn

我正在使用PyTorch分析一些代码。我知道CUDA通常有一些异步执行（请参阅），但我相信从GPU到CPU的传输通常会强制同步

出于这个原因，我决定天真地使用

cProfile

，但我注意到

profile.enable（）报告的时间。。。Profile.disable（）

与跨

time.time（）

（作为增量）记录的时间不同

下面是代码在高层的外观：

gpu=torch.device（“cuda”）
cpu=火炬装置（“cpu”）
setup=setup（）
net=make\u fcn\u resnet50（num\u classes=setup.D）
net.eval（）.to（gpu）
rgb_张量=setup.sample（设备=cpu）
pr=profile.profile（）
pr.enable（）
t_start=time.time（）
rgb_张量=rgb_张量至（gpu）
y=净（rgb_张量）
dd_张量=y[“out”]
dd_-mean=torch.mean（dd_-tensor[[0]]）到（cpu.numpy（））
断言dd_的意思不是无
dt=time.time（）-t_start
pr.disable（）
stats=pstats.stats（pr）
统计数据。打印统计数据（5）
打印（f“dt:{dt:.4f}s”）

以下是我看到的差异：

2925 function calls (2734 primitive calls) in 0.009 seconds
...
dt: 0.0355s

我本来希望cProfile报告大约35毫秒（与

dt

相同），但它报告大约10毫秒

为什么会发生这种情况

完整代码+复制在此：

根据经验，如果不“刷新”所有输出，或者如果代码没有完全封装在函数中，则

cProfile

似乎不会“挂钩”到代码中

更多详情请参见此处的评论：

记录的所有计时结果包括：

Ubuntu 18.04
CPython 3.6.9
nvidia-driver-450（450.102.04-0ubuntu0.18.04.1）
英伟达泰坦RTX

尽管如此，使用PyTorch提供的机制（facepalm:）可能会更好：