Numpy 计算距离的最快方法_Numpy_Tensorflow

Numpy 计算距离的最快方法

numpy tensorflow

Numpy 计算距离的最快方法,numpy,tensorflow,Numpy,Tensorflow,我有 a（n，d）维矩阵x a（d，L）维矩阵c 让我们用xu b表示x的第b行，用ci表示c的第i列。我想计算（n，L）维度矩阵diff，其中包含at（b，I）：其中，条是L2规范我当前的解决方案使用的技巧是 || x - y ||^2 = ||x||^2 + ||y||^2 - 2<x, y> | x-y | | ^2=| | x | | ^2+| | y | ^2-2 我这样计算： # (n, L), contains at (b, i) the inner pro

我有

a
```
（n，d）
```
维矩阵
```
x
```
a
```
（d，L）
```
维矩阵
```
c
```

让我们用

xu b

表示

的第b行，用

ci

表示

的第i列。我想计算

（n，L）

维度矩阵

diff

，其中包含at

（b，I）

：

其中，条是L2规范

我当前的解决方案使用的技巧是

|| x - y ||^2 = ||x||^2 + ||y||^2 - 2<x, y>

| x-y | | ^2=| | x | | ^2+| | y | ^2-2

我这样计算：

# (n, L), contains at (b, i) the inner product <x_b, c_i>
xc = tf.matmul(x, c)

# (n, 1), contains at row b ||x_b||^2
xx = tf.reduce_sum(tf.square(x), axis=1, keep_dims=True)

# (1, L), contains at column i ||c_i||^2
cc = tf.reduce_sum(tf.square(c), axis=0, keep_dims=True)

# (n, L), contains at (b, i):
# || x_b - c_i ||^2 = ||x_b||^2 + ||c_i||^2 - 2 * <x_b, c_i>
diff = xx + cc - 2*xc

#（n，L），包含at（b，i）内积
xc=tf.matmul（x，c）
#（n，1），包含在第b | | x|u b | | ^2行
xx=tf.减少总和（tf.平方（x），轴=1，保持dims=True）
#（1，L），包含在第i | | c|u i | | ^2列
cc=tf.减少总和（tf.平方（c），轴=0，保持直径=True）
#（n，L），包含在（b，i）处：
#|x|u b-c|i | 2=|x|u b | ^2+| c|i | ^2-2*
差异=xx+cc-2*xc

它工作正常，速度也相当快。但我想知道，有没有办法让计算速度更快？

它有多慢？在我的TitanX pascal上，当x和c是8k x 8k矩阵时，我可以在0.15秒内运行您的计算。这就达到了TitanX 11吨/秒的极限，几乎所有的时间都花在了matmul@YaroslavBulatov是的，它并没有那么慢，只是我每次迭代都在运行它，作为神经网络训练过程的一部分，每一点速度都会在训练中节省潜在的时间。@YaroslavBulatov在我的设置中，n大约是300k或更多，d非常小，一个潜在的优化是，由于数据是按行的主要顺序存储的，因此，由于数据的局部性（即CPU上的速度比CPU上的速度快4-5倍），减少轴=1比减少轴=0要快。转置速度很慢，但是您可以确保

最初是以转置顺序存储的，然后作为

tf.matmul（x，c，Transpose_b=True）

运行，它不会进行显式转置

# (n, L), contains at (b, i) the inner product <x_b, c_i>
xc = tf.matmul(x, c)

# (n, 1), contains at row b ||x_b||^2
xx = tf.reduce_sum(tf.square(x), axis=1, keep_dims=True)

# (1, L), contains at column i ||c_i||^2
cc = tf.reduce_sum(tf.square(c), axis=0, keep_dims=True)

# (n, L), contains at (b, i):
# || x_b - c_i ||^2 = ||x_b||^2 + ||c_i||^2 - 2 * <x_b, c_i>
diff = xx + cc - 2*xc