Python 纯张量流中的Gram-Schmidt正交化：迭代解的性能比numpy慢得多_Python_Numpy_Tensorflow

Python 纯张量流中的Gram-Schmidt正交化：迭代解的性能比numpy慢得多

python numpy tensorflow

Python 纯张量流中的Gram-Schmidt正交化：迭代解的性能比numpy慢得多,python,numpy,tensorflow,Python,Numpy,Tensorflow,我想做Gram-Schmidt正交化来修正在纯Tensorflow中开始稍微偏离正交性的大矩阵（在更大的计算范围内在图上做，而不破坏它）。我看到的解决方案是“外部”使用的（在内部执行多个ses.run）所以我自己写了一个简单且效率很低的实现： def tf_gram_schmidt(vectors): # add batch dimension for matmul basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]

我想做Gram-Schmidt正交化来修正在纯Tensorflow中开始稍微偏离正交性的大矩阵（在更大的计算范围内在图上做，而不破坏它）。我看到的解决方案是“外部”使用的（在内部执行多个

ses.run

）

所以我自己写了一个简单且效率很低的实现：

def tf_gram_schmidt(vectors):
    # add batch dimension for matmul
    basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
    for i in range(1,vectors.get_shape()[0].value):
        v = vectors[i,:]
        # add batch dimension for matmul
        v = tf.expand_dims(v,0) 
        w = v - tf.matmul(tf.matmul(v, tf.transpose(basis)), basis)
         # I assume that my matrix is close to orthogonal
        basis = tf.concat([basis, w/tf.norm(w)],axis=0)
    return basis

但是，当我将它与相同的迭代外部代码进行比较时，它的速度慢了3倍（在GPU上！！！）（虽然精度稍高一些）：

（UPD 4：我的示例中有一个小错误，但它根本没有改变计时，因为

ort_discience（）

是一个轻量级函数）：

最简单的例子：

import tensorflow as tf

import numpy as np

import time

# found this code somewhere on stackoverflow
def np_gram_schmidt(vectors):
    basis = []
    for v in vectors:
        w = v - np.sum( np.dot(v,b)*b  for b in basis )
        if (w > 1e-10).any():  
            basis.append(w/np.linalg.norm(w))
        else:
            basis.append(np.zeros(w.shape))
    return np.array(basis)



def tf_gram_schmidt(vectors):
    # add batch dimension for matmul
    basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
    for i in range(1,vectors.get_shape()[0].value):
        v = vectors[i,:]
        # add batch dimension for matmul
        v = tf.expand_dims(v,0) 
        w = v - tf.matmul(tf.matmul(v, tf.transpose(basis)), basis)
         # I assume that my matrix is close to orthogonal
        basis = tf.concat([basis, w/tf.norm(w)],axis=0)
    return basis





# how much matrix differs from orthogonal
# computes ||W*W^T - I||2
def ort_discrepancy(matrix):    
    wwt = tf.matmul(matrix, matrix, transpose_a=True)
    rows = tf.shape(wwt)[0]
    cols = tf.shape(wwt)[1]    
    return tf.norm((wwt - tf.eye(rows,cols)),ord='euclidean') 


np.random.seed(0)
# white noise matrix
np_nearly_orthogonal = np.random.normal(size=(2000,2000)) 
# centered rows
np_nearly_orthogonal = np.array([row/np.linalg.norm(row) for row in np_nearly_orthogonal]) 


tf_nearly_orthogonal = tf.Variable(np_nearly_orthogonal,dtype=tf.float32)


init = tf.global_variables_initializer()



with tf.Session() as sess:
    sess.run(init)

    print("how much source differs from orthogonal matrix:")
    print(ort_discrepancy(tf_nearly_orthogonal).eval())

    print("tensorflow version:")
    start = time.time()

    print(ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal)).eval())

    end = time.time()
    print("Time elapsed: %sms"%(1000*(end-start)))

    print("numpy version with tensorflow and variable re-assign to the result of numpy code:")
    start = time.time()

    tf_nearly_orthogonal = tf.Variable(np_gram_schmidt(tf_nearly_orthogonal.eval()),dtype=tf.float32)
    sess.run(tf.variables_initializer([tf_nearly_orthogonal]))



    # check that variable was updated
    print(ort_discrepancy(tf_nearly_orthogonal).eval())
    end = time.time()
    print("Time elapsed: %sms"%(1000*(end-start)))

有没有办法加快速度？我不知道如何为G-S做这件事，因为G-S需要附加到基础上（因此没有

tf.map\fn

并行化可以帮助）

UPD：通过优化

tf.matmul

，我在2x中实现了差异：

def tf_gram_schmidt(vectors):
    # add batch dimension for matmul
    basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
    for i in range(1,vectors.get_shape()[0].value):
        v = vectors[i,:]
        # add batch dimension for matmul
        v = tf.expand_dims(v,0) 
        w = v - tf.matmul(tf.matmul(v, basis, transpose_b=True), basis)
         # I assume that my matrix is close to orthogonal
        basis = tf.concat([basis, w/tf.norm(w)],axis=0)
    return basis





how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.0335421
Time elapsed: 17004.458189ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8082.20791817ms

编辑2：

为了好玩，尝试完全模仿numpy解决方案，得到了非常长的工作代码：

def tf_gram_schmidt(vectors):
    # add batch dimension for matmul
    basis = tf.expand_dims(vectors[0,:]/tf.norm(vectors[0,:]),0)
    for i in range(1,vectors.get_shape()[0].value):

        v = vectors[i,:]        
        # like in numpy example
        multiplied = tf.reduce_sum(tf.map_fn(lambda b: tf.scalar_mul(tf.tensordot(v,b,axes=[[0],[0]]),b), basis), axis=0)
        w = v - multiplied    



        ## add batch dimension for matmul
        ##v = tf.expand_dims(v,0) 
        ##w = v - tf.matmul(tf.matmul(v, basis, transpose_b=True), basis) 

        # I assume that my matrix is close to orthogonal
        basis = tf.concat([basis, tf.expand_dims(w/tf.norm(w),0)],axis=0)
    return basis

（这似乎也占用了GPU内存）：

UPD3：我的GPU是GTX1050，它通常比我的CPU快5-7倍。所以结果对我来说很奇怪

UPD5：好的，我发现GPU几乎不用于此代码，而人工编写的反向传播训练神经网络使用了大量的

tf。matmul

和其他矩阵算法充分利用了它。为什么会这样

UPD 6：

根据给出的建议，我以一种新的方式测量时间：

# Akshay's suggestion to measure performance correclty
orthogonalized = ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal))

with tf.Session() as sess:
    sess.run(init)

    print("how much source differs from orthogonal matrix:")
    print(ort_discrepancy(tf_nearly_orthogonal).eval())

    print("tensorflow version:")
    start = time.time()

    tf_result = sess.run(orthogonalized)

    end = time.time()

    print(tf_result)

    print("Time elapsed: %sms"%(1000*(end-start)))

    print("numpy version with tensorflow and variable re-assign to the result of numpy code:")
    start = time.time()

    tf_nearly_orthogonal = tf.Variable(np_gram_schmidt(tf_nearly_orthogonal.eval()),dtype=tf.float32)
    sess.run(tf.variables_initializer([tf_nearly_orthogonal]))



    # check that variable was updated
    print(ort_discrepancy(tf_nearly_orthogonal).eval())

    end = time.time()
    print("Time elapsed: %sms"%(1000*(end-start)))

现在我可以看到4倍的加速：

how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.018951
Time elapsed: 2594.85888481ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8851.86600685ms

TensorFlow看起来很慢，因为您的基准测试正在测量构建图形的时间和执行图形所需的时间；TensorFlow和NumPy之间更公平的比较将从基准中排除图形构造。特别是，您的基准应该如下所示：

print("tensorflow version:")
# This line constructs the graph but does not execute it.
orthogonalized = ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal))

start = time.time()
tf_result = sess.run(orthogonalized)
end = time.time()

非常感谢！！现在我重新测量了时间，看到了4x加速（见上次更新）我还测量了Numpy函数中的时间，处理它需要7秒，因此纯tensorflow的速度是7/2.6=2.7倍。不是很好，我必须说，我认为可能会优化；但至少现在我可以在不破坏代码的情况下进行正交化。

how much source differs from orthogonal matrix:
44.7176
tensorflow version:
0.018951
Time elapsed: 2594.85888481ms
numpy version with tensorflow and variable re-assign to the result of numpy code:
0.057589
Time elapsed: 8851.86600685ms

print("tensorflow version:")
# This line constructs the graph but does not execute it.
orthogonalized = ort_discrepancy(tf_gram_schmidt(tf_nearly_orthogonal))

start = time.time()
tf_result = sess.run(orthogonalized)
end = time.time()