Python 指定尺寸时，使用GPU时torch.max比使用CPU时慢_Python_Deep Learning_Pytorch

Python 指定尺寸时，使用GPU时torch.max比使用CPU时慢

python deep-learning pytorch

Python 指定尺寸时，使用GPU时torch.max比使用CPU时慢,python,deep-learning,pytorch,Python,Deep Learning,Pytorch,10000个回路，最佳3个：每个回路144µs 10000个回路，最好为3个：每个回路985µs 正如您在上面所看到的，GPU比CPU占用的时间多得多。但若我并没有指定计算max的维度，那个么GPU会更快 %timeit -n 10000 max_h = torch.max(t1_h, 0) %timeit -n 10000 max_d = torch.max(t1_d, 0) 10000个回路，最好为3:111µs/回路 10000个回路，最佳3个：每个回路41.8µs 我还尝试了使用arg

10000个回路，最佳3个：每个回路144µs

10000个回路，最好为3个：每个回路985µs

正如您在上面所看到的，GPU比CPU占用的时间多得多。但若我并没有指定计算max的维度，那个么GPU会更快

%timeit -n 10000 max_h = torch.max(t1_h, 0)
%timeit -n 10000 max_d = torch.max(t1_d, 0)

10000个回路，最好为3:111µs/回路

10000个回路，最佳3个：每个回路41.8µs

我还尝试了使用

argmax

而不是

max

，但它工作正常（GPU比CPU快）

10000个回路，最好为3:108µs/回路

10000个回路，最好为3个：每个回路18.1µs

在指定尺寸后，

torch.max

在GPU上速度慢有什么原因吗？

我自己发现了这一点，并且。它看起来很快就会被修复-可能是1.5版或1.6版但与此同时，建议的解决办法是使用

ii=a.argmax（0）
maxval=a.gather（0，ii.unsqueze（0））.squence（0）

在我看来，这似乎是pytorch问题追踪器上应该提出的问题。

%timeit -n 10000 max_h = torch.max(t1_h, 0)
%timeit -n 10000 max_d = torch.max(t1_d, 0)

%timeit -n 10000 max_h = torch.max(t1_h)
%timeit -n 10000 max_d = torch.max(t1_d)

%timeit -n 10000 cs_h = torch.argmax(t1_h, 0)
%timeit -n 10000 cs_d = torch.argmax(t1_d, 0)