Python 具有复索引的矢量化损失函数_Python_Vectorization_Pytorch

Python 具有复索引的矢量化损失函数

python pytorch

Python 具有复索引的矢量化损失函数,python,vectorization,pytorch,Python,Vectorization,Pytorch,我已经在pytorch中编写了一个损失函数，但是它太慢了。因此，我试图将其矢量化描述给定两个网格，它计算在其上定义的面片（即顶点集）之间的变形。失真是指每对点之间的距离变化的程度补丁可能有不同的大小这假设M和N之间存在对应关系损失接受批次非矢量化代码 def loss_func（y_pred，批次）： y_true=批处理['vertices'] 补丁：元组（张量，…）=批处理['patches'] 损失=0 对于范围内的i（y_pred.shape[0]）：贴片损耗=0

我已经在pytorch中编写了一个损失函数，但是它太慢了。因此，我试图将其矢量化

描述给定两个网格，它计算在其上定义的面片（即顶点集）之间的变形。失真是指每对点之间的距离变化的程度

补丁可能有不同的大小
这假设M和N之间存在对应关系
损失接受批次

非矢量化代码

def loss_func（y_pred，批次）：
y_true=批处理['vertices']
补丁：元组（张量，…）=批处理['patches']
损失=0
对于范围内的i（y_pred.shape[0]）：
贴片损耗=0
对于修补程序中的修补程序：
dist_a=torch.pdist（y_pred[i][patch]）
dist_b=torch.pdist（y_true[i][patch]）
贴片损失+=火炬总数（（距离a-距离b）**2）
损耗+=补片损耗
退货损失/y_预定形状[0]

矢量化尝试

def loss_func（y_pred，批次）：
y_true=批处理['vertices']
补丁：元组（张量，…）=批处理['patches']
pred_dists=torch.cdist（y_pred，y_pred）
true\u dists=torch.cdist（y\u true，y\u true）
差异距离=（前距离-真实距离）**2
损失=0
对于修补程序中的修补程序：
d=diff_dists[：，patch，：][：，：，patch]
损失+=火炬和（d，dim=（1,2））/2
返回损失。平均值（）

问题还是太慢了。我在矢量化中遇到了困难，因为面片可能有不同的维度，因此它们不能放在张量中

我觉得我可以直接使用邻接矩阵对代码进行矢量化，而无需在面片中收集顶点索引

面片的计算方法如下：

def get_面片（顶点索引，vertex2vertex）：
返回vertex2vertex[vertex_索引，：]。非零（）

工作示例我准备了一个自包含的示例，用这两个函数和一些基准测试和实验。在基准测试中，有与参考时间相同的无面片损失函数（即所有顶点之间的失真）

导入火炬
导入时间
设备='cuda'
精度=1.5倍
批量大小=8
顶点数=400
尺寸=2
y_pred=torch.rand（批量大小、顶点数量、尺寸、设备=设备、数据类型=精度）
y_true=torch.rand（批量大小、顶点数量、尺寸、设备=设备、数据类型=精度）
邻接=（torch.rand（num_顶点，num_顶点，device=device，dtype=precision）>0.95）。到（precision）
def get_面片（顶点索引，顶点2Vertex）：
返回vertex2vertex[vertex_索引，：]。非零（）
patches=元组（获取范围内i的_面片（i，邻接）（num_顶点））
批次={
“顶点”：y_true，
“补丁”：补丁
}
def丢失功能（y_pred，批次）：
y_true=批处理['vertices']
补丁：元组（张量，…）=批处理['patches']
损失=0
对于范围内的i（y_pred.shape[0]）：
贴片损耗=0
对于修补程序中的修补程序：
dist_a=torch.pdist（y_pred[i][patch]）
dist_b=torch.pdist（y_true[i][patch]）
贴片损失+=火炬总数（（距离a-距离b）**2）
损耗+=补片损耗
退货损失/y_预定形状[0]
def丢失功能向量（y_pred，批次）：
y_true=批处理['vertices']
补丁：元组（张量，…）=批处理['patches']
pred_dists=torch.cdist（y_pred，y_pred）
true\u dists=torch.cdist（y\u true，y\u true）
差异距离=（前距离-真实距离）**2
损失=0
对于修补程序中的修补程序：
d=diff_dists[：，patch，：][：，：，patch]
损失+=火炬和（d，dim=（1,2））/2
返回损失。平均值（）
def参考_无修补程序（y_pred，批次）：
y_true=批处理['vertices']
da=火炬.cdist（y_pred，y_pred）
db=torch.cdist（是真的，是真的）
返回火炬.sum（（da-db）**2）/y_真.shape[0]/2
打印（f'损失函数（迭代）：时间（损失）'）
打印（）
对于范围（5）中的i：
开始时间=time.time（）
res=损失函数（y\U pred，批次）
torch.cuda.synchronize（）
打印（f'Standard{i}:\t{time.time（）-start_time:.4f}s（{res:.5f}））
打印（）
对于范围（5）中的i：
开始时间=time.time（）
res=损失函数向量（y\U pred，批次）
torch.cuda.synchronize（）
打印（f'矢量化的{i}:\t{time.time（）-start_time:.4f}s（{res:.5f}））
打印（）
对于范围（5）中的i：
开始时间=time.time（）
res=参考补丁（y\U pred，批次）
torch.cuda.synchronize（）
打印（不带修补程序的f'Reference{i}:\t{time.time（）-start_time:.4f}s（{res:.5f}））

上下文所涉及的矩阵很小。每个网格中的顶点数约为500，因此邻接矩阵为

[500500]

。

面片的尺寸可能不同，但是每个面片的尺寸都不同，为什么不填充

面片

元组？在每个面片中都有包含在该面片中的顶点的索引，我应该使用哪个值来填充它们？一旦填充，你将如何使用该矩阵？我觉得应该有一种方法来使用邻接矩阵，它在某种程度上包含了所有的补丁信息

LOSS FUNCTION (ITERATION): TIME (LOSS)

Standard 0: 0.3311s (9744.87875)
Standard 1: 0.2853s (9744.87875)
Standard 2: 0.2929s (9744.87875)
Standard 3: 0.2972s (9744.87875)
Standard 4: 0.2714s (9744.87875)

Vectorized 0:   0.0254s (9744.87875)
Vectorized 1:   0.0251s (9744.87875)
Vectorized 2:   0.0262s (9744.87875)
Vectorized 3:   0.0242s (9744.87875)
Vectorized 4:   0.0241s (9744.87875)

Reference without patches 0:    0.0063s (9695.41083)
Reference without patches 1:    0.0062s (9695.41083)
Reference without patches 2:    0.0062s (9695.41083)
Reference without patches 3:    0.0066s (9695.41083)
Reference without patches 4:    0.0067s (9695.41083)

def loss_func_vec_mem(y_pred, batch):
    y_true = batch['vertices']
    patches: Tuple(Tensor,...) = batch['patches']

    loss = 0
    for patch in patches:
        pred = y_pred[:, patch, :]
        true = y_true[:, patch, :]

        pred_dists = torch.cdist(pred, pred)
        true_dists = torch.cdist(true, true)

        diff_dists = (pred_dists - true_dists) ** 2

        loss += torch.sum(diff_dists, dim=(1, 2)) 
    return loss.mean() / 2

LOSS FUNCTION (ITERATION): TIME (LOSS)

Vec mem 0:  0.0484s (14685.45521)
Vec mem 1:  0.0489s (14685.45521)
Vec mem 2:  0.0471s (14685.45521)
Vec mem 3:  0.0471s (14685.45521)
Vec mem 4:  0.0459s (14685.45521)

Vectorized 0:   0.0271s (29370.91043)
Vectorized 1:   0.0262s (29370.91043)
Vectorized 2:   0.0320s (29370.91043)
Vectorized 3:   0.0266s (29370.91043)
Vectorized 4:   0.0263s (29370.91043)

Standard 0: 0.1680s (14685.45521)
Standard 1: 0.1614s (14685.45521)
Standard 2: 0.1639s (14685.45521)
Standard 3: 0.1606s (14685.45521)
Standard 4: 0.1694s (14685.45521)

Reference without patches 0:    0.0049s (12874.01275)
Reference without patches 1:    0.0105s (12874.01275)
Reference without patches 2:    0.0103s (12874.01275)
Reference without patches 3:    0.0092s (12874.01275)
Reference without patches 4:    0.0100s (12874.01275)