Performance 如何避免pytorch中的for循环?是否有一个有效计算的函数?
我的Pytorch神经网络中有以下代码:Performance 如何避免pytorch中的for循环?是否有一个有效计算的函数?,performance,pytorch,Performance,Pytorch,我的Pytorch神经网络中有以下代码: cos = nn.CosineSimilarity(dim=1) d = torch.zeros(batch_sz, n, n).to(device="cuda") for i in range(n): for j in range(n): d[:, i, j] = cos(q[:, i, :], k[:, j, :]) q和k都是(批次号为sz,n,m)。 这段代码显然减慢了我的程序的速度,我想知道
cos = nn.CosineSimilarity(dim=1)
d = torch.zeros(batch_sz, n, n).to(device="cuda")
for i in range(n):
for j in range(n):
d[:, i, j] = cos(q[:, i, :], k[:, j, :])
q
和k
都是(批次号为sz,n,m)
。
这段代码显然减慢了我的程序的速度,我想知道Pytorch是否提供了任何可能使其更高效的功能
非常感谢 我不确定如何使用
nn.CosineSimilarity
进行矢量化,但您可以使用此矢量化实现。它计算余弦相似性的方法与PyTorch的内部模块相同
import torch
import torch.nn as nn
import time
# some dummy inputs
n=20
m=30
batch_sz = 10
k = torch.rand(batch_sz, n, m)
q = torch.rand(batch_sz, n, m)
d = torch.zeros(batch_sz, n, n)
cos = nn.CosineSimilarity(dim=1)
for i in range(n):
for j in range(n):
d[:, i, j] = cos(q[:, i, :], k[:, j, :])
# dot product (numerator)
out = torch.bmm(q, k.transpose(1,2))
# computing the denominator in the next 5 steps
# compute the norm and restore dimensions
q_norm = q.norm(dim=2).unsqueeze(2)
k_norm = k.norm(dim=2).unsqueeze(1)
# This repeats the norms along dim 2 for q and dim 1 for k
q_norm_expanded = q_norm.expand(batch_sz, n, n)
k_norm_expanded = k_norm.expand(batch_sz, n, n)
# we compute the product.
norms = q_norm_expanded* k_norm_expanded
# cosine similarity
out = out/(norms+1e-9)
print(torch.allclose(d, out))
扩展和倍增范数的过程实际上是计算外积。因此,您也可以使用以下操作:
norms = torch.bmm(q_norm, k_norm)
而不是
q_norm_expanded = q_norm.expand(batch_sz, n, n)
k_norm_expanded = k_norm.expand(batch_sz, n, n)
norms = q_norm_expanded* k_norm_expanded
我刚刚意识到,你可以在手前对向量进行规范化,以得到一个更简洁、计算更稳定的版本
q_norm = q.norm(dim=2)+1e-9
k_norm = k.norm(dim=2)+1e-9
q = q/q_norm.unsqueeze(2)
k = k/k_norm.unsqueeze(2)
out = torch.bmm(q, k.transpose(1,2))