Tensorflow矩阵乘法比numpy慢_Numpy_Tensorflow_Gpu

Tensorflow矩阵乘法比numpy慢

numpy tensorflow

Tensorflow矩阵乘法比numpy慢,numpy,tensorflow,gpu,Numpy,Tensorflow,Gpu,当我运行此代码时，我得到如下结果： import tensorflow as tf import numpy as np from time import time def print_timer(func): start_time = time() func() end_time = time() print(end_time - start_time) N = 4 A = np.random.randn(N, 1000, 16000) B = np.random.rand

当我运行此代码时，我得到如下结果：

import tensorflow as tf
import numpy as np
from time import time

def print_timer(func):
  start_time = time()
  func()
  end_time = time()
  print(end_time - start_time)

N = 4
A = np.random.randn(N, 1000, 16000)
B = np.random.randn(N, 16000, 10)
sess = tf.Session()


A_ = tf.constant(A)
B_ = tf.constant(B)

def np_test():
  r = np.empty([N, 1000, 10])
  for i in range(N):
    r[i] = np.matmul(A[i], B[i])

print_timer(lambda: np.matmul(A, B))
print_timer(lambda: sess.run(tf.matmul(A,B)))

哪些是运行时间

我不知道为什么tensorflow.matmul比numpy.matmul慢。我在P40 NVIDIA GPU上运行这段代码，我使用的tensorflow版本是1.4

当我尝试在tensorflow 1.8上运行此代码时，得到了相同的结果

如果tensorflow并行运行矩阵乘法，那么GPU上矩阵乘法的运行时间不应该比在CPU上运行的numpy上的运行时间快得多吗？

您没有使用您创建的常量张量。更改此项：

1.3403866291046143
4.291470527648926

为此：

print_timer(lambda: sess.run(tf.matmul(A,B)))

您在求值中有

tf.matmul

，这意味着您正在测量创建操作的时间及其计算时间

请尝试以下方法：

print_timer(lambda: sess.run(tf.matmul(A_,B_)))

一些评论：

不要重新发明轮子，使用
```
timeit
```
测量计算时间
自python 3.5以来，您可以使用
```
@
```
作为矩阵乘法的快捷方式

是否可能将数据复制到CPU会增加很大的开销？我会尝试更长的操作，例如100000次矩阵乘法。我可能完全错了。对

A，B=tf做同样的处理。获取变量（'v0'，初始值设定项=A），…

并准备好惊喜（没有cudaMemcpy）。

import tensorflow as tf
import numpy as np

sess = tf.InteractiveSession()

N = 4
A = np.random.randn(N, 1000, 16000)
B = np.random.randn(N, 16000, 10)
A_ = tf.constant(A)
B_ = tf.constant(B)

AB_ = A_ @ B_

%timeit np.matmul(A, B)
%timeit sess.run(AB_)