Python 列/行切片火炬稀疏张量_Python_Slice_Sparse Matrix_Pytorch

Python 列/行切片火炬稀疏张量

python pytorch

Python 列/行切片火炬稀疏张量,python,slice,sparse-matrix,pytorch,Python,Slice,Sparse Matrix,Pytorch,我有一个pytorch稀疏张量，我需要使用这个切片[idx][：，idx]按行/列进行切片，其中idx是一个索引列表，使用上面提到的切片在普通浮点张量上产生我想要的结果。有可能在稀疏张量上应用相同的切片吗？示例如下： #constructing sparse matrix i = np.array([[0,1,2,2],[0,1,2,1]]) v = np.ones(4) i = torch.from_numpy(i.astype("int64")) v = torch.from_numpy(v

我有一个pytorch稀疏张量，我需要使用这个切片[idx][：，idx]按行/列进行切片，其中idx是一个索引列表，使用上面提到的切片在普通浮点张量上产生我想要的结果。有可能在稀疏张量上应用相同的切片吗？示例如下：

#constructing sparse matrix
i = np.array([[0,1,2,2],[0,1,2,1]])
v = np.ones(4)
i = torch.from_numpy(i.astype("int64"))
v = torch.from_numpy(v.astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)

#constructing float tensor
test2 = np.array([[1,0,0],[0,1,0],[0,1,1]])
test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False)

#slicing
idx = [1,2]
print(test2[idx][:,idx])

输出：

Variable containing:
 1  0
 1  1
[torch.cuda.FloatTensor of size 2x2 (GPU 0)]

我持有一个250.000 x 250.000的邻接矩阵，在这里我需要使用随机idx，通过简单地采样n个随机idx，对n行和n列进行切片。由于数据集太大，因此转换为更方便的数据类型是不现实的

我可以在test1上获得相同的切片结果吗？有可能吗？如果没有，是否有解决办法

现在，我正在使用以下解决方案运行我的模型：

idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000))
test1 = test1AsCsr[idx][:,idx].todense().astype("int32")
test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)

其中test1AsCsr是转换为numpy CSR矩阵的test1。这个解决方案可以工作，但是速度非常慢，并且使我的GPU利用率非常低，因为它需要不断地从CPU内存读/写

编辑：结果是非稀疏张量很好

二维稀疏指数的可能答案在下面找到答案，使用几个pytorch方法torch.eq、torch.unique、torch.sort等，以输出形状为lenidx、lenidx的紧凑切片张量

我测试了几个边缘情况：无序的idx、0的v、多个相同索引对的I等等，尽管我可能忘记了一些。还应检查性能

import torch
import numpy as np

def in1D(x, labels):
    """
    Sub-optimal equivalent to numpy.in1D().
    Hopefully this feature will be properly covered soon
    c.f. https://github.com/pytorch/pytorch/issues/3025
    Snippet by Aron Barreira Bordin
    Args:
        x (Tensor):             Tensor to search values in
        labels (Tensor/list):   1D array of values to search for

    Returns:
        Tensor: Boolean tensor y of same shape as x, with y[ind] = True if x[ind] in labels

    Example:
        >>> in1D(torch.FloatTensor([1, 2, 0, 3]), [2, 3])
        FloatTensor([False, True, False, True])
    """
    mapping = torch.zeros(x.size()).byte()
    for label in labels:
        mapping = mapping | x.eq(label)
    return mapping


def compact1D(x):
    """
    "Compact" values 1D uint tensor, so that all values are in [0, max(unique(x))].
    Args:
        x (Tensor): uint Tensor

    Returns:
        Tensor: uint Tensor of same shape as x

    Example:
        >>> densify1D(torch.ByteTensor([5, 8, 7, 3, 8, 42]))
        ByteTensor([1, 3, 2, 0, 3, 4])
    """
    x_sorted, x_sorted_ind = torch.sort(x, descending=True)
    x_sorted_unique, x_sorted_unique_ind = torch.unique(x_sorted, return_inverse=True)
    x[x_sorted_ind] = x_sorted_unique_ind
    return x

# Input sparse tensor:
i = torch.from_numpy(np.array([[0,1,4,3,2,1],[0,1,3,1,4,1]]).astype("int64"))
v = torch.from_numpy(np.arange(1, 7).astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)
print(test1.to_dense())
# tensor([[ 1.,  0.,  0.,  0.,  0.],
#         [ 0.,  8.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  0.,  5.],
#         [ 0.,  4.,  0.,  0.,  0.],
#         [ 0.,  0.,  0.,  3.,  0.]])

# note: test1[1, 1] = v[i[1,:]] + v[i[6,:]] = 2 + 6 = 8
#       since both i[1,:] and i[6,:] are [1,1]

# Input slicing indices:
idx = [4,1,3]

# Getting the elements in `i` which correspond to `idx`:
v_idx = in1D(i, idx).byte()
v_idx = v_idx.sum(dim=0).squeeze() == i.size(0) # or `v_idx.all(dim=1)` for pytorch 0.5+
v_idx = v_idx.nonzero().squeeze()

# Slicing `v` and `i` accordingly:
v_sliced = v[v_idx]
i_sliced = i.index_select(dim=1, index=v_idx)

# Building sparse result tensor:
i_sliced[0] = compact1D(i_sliced[0])
i_sliced[1] = compact1D(i_sliced[1])

# To make sure to have a square dense representation:
size_sliced = torch.Size([len(idx), len(idx)])
res = torch.sparse.FloatTensor(i_sliced, v_sliced, size_sliced)

print(res)
# torch.sparse.FloatTensor of size (3,3) with indices:
# tensor([[ 0,  2,  1,  0],
#         [ 0,  1,  0,  0]])
# and values:
# tensor([ 2.,  3.,  4.,  6.])

print(res.to_dense())
# tensor([[ 8.,  0.,  0.],
#         [ 4.,  0.,  0.],
#         [ 0.,  3.,  0.]])

一维稀疏指数的上一个答案下面是一个可能是次优的且不涵盖所有边缘情况的解决方案，遵循相关案例中共享的直觉，希望该功能不久将得到适当的涵盖：

# Constructing a sparse tensor a bit more complicated for the sake of demo:
i = torch.LongTensor([[0, 1, 5, 2]])
v = torch.FloatTensor([[1, 3, 0], [5, 7, 0], [9, 9, 9], [1,2,3]])
test1 = torch.sparse.FloatTensor(i, v)

# note: if you directly have sparse `test1`, you can get `i` and `v`:
# i, v = test1._indices(), test1._values()

# Getting the slicing indices:
idx = [1,2]

# Preparing to slice `v` according to `idx`.
# For that, we gather the list of indices `v_idx` such that i[v_idx[k]] == idx[k]:
i_squeeze = i.squeeze()
v_idx = [(i_squeeze == j).nonzero() for j in idx] # <- doesn't seem optimal...
v_idx = torch.cat(v_idx, dim=1)

# Slicing `v` accordingly:
v_sliced = v[v_idx.squeeze()][:,idx]

# Now defining your resulting sparse tensor.
# I'm not sure what kind of indexing you want, so here are 2 possibilities:
# 1) "Dense" indixing:
test1x = torch.sparse.FloatTensor(torch.arange(v_idx.size(1)).long().unsqueeze(0), v_sliced)
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  0  1
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

# 2) "Sparse" indixing using the original `idx`:
test1x = torch.sparse.FloatTensor(autograd.Variable(torch.LongTensor(idx)).unsqueeze(0), v_sliced)
# note: this indexing would fail if elements of `idx` were not in `i`.
print(test1x)
# torch.sparse.FloatTensor of size (3,2) with indices:
#
#  1  2
# [torch.LongTensor of size (1,2)]
# and values:
#
#  7  0
#  2  3
# [torch.FloatTensor of size (2,2)]

除了在I中手动搜索idx中的值之外，我不确定现在是否有一种简单的方法来实现这一点。这个问题有一个解决方案。我迫切需要解决这个问题，所以任何解决方案，即使不是一个简单的解决方案，都是非常受欢迎的。你将如何从i向量构造新的切片矩阵？不容易，我的意思是不是最优。。。我不确定建议的解决方法如何适用于更大的张量，如果它以您想要的方式工作的话。我尝试了一个使用test1.\u索引和np.where的解决方案，然后根据结果构造一个新的张量，但是这会变慢，因为np.where搜索是线性的，不在GPU上运行。scipy.sparse.csr使用矩阵乘法选择多行或多列。它构造了一个抽取器稀疏矩阵。我在这里演示：这可能是我想要的，但是我在pytorch上遇到了一些奇怪的问题，不允许我写变量，这使得测试变得困难：I.squence是用来做什么的？更新了这个问题，我给出的索引是二维的。好吧，我给了它一些想法。这是一个非常有趣的问题！并提出了一个可能的解决方案。让我知道你的想法我认为它是有效的，只是速度太慢了——以合理的速度可能是不可能的，我自己尝试了一种基于词典的方法，以利用：