Python:如何在两个轴上附加(稀疏)二维数组?

Python:如何在两个轴上附加(稀疏)二维数组?,python,numpy,scipy,sparse-matrix,Python,Numpy,Scipy,Sparse Matrix,我需要从文件系统中某个地方的csv平面文件中读取一些2d矩阵,然后在遍历和向下遍历时对它们进行整理。csv文件包含稀疏数据,假设我只有以下4个文件 file_001.csv file_002.csv ----------- ---------- 11,0,0,0,0,11 22,0,0,0,0,22 0,0,0,0,0,0 0,0,0,0,0,0 0,0,0,0,0,0 0,0,0,

我需要从文件系统中某个地方的csv平面文件中读取一些2d矩阵,然后在遍历和向下遍历时对它们进行整理。csv文件包含稀疏数据,假设我只有以下4个文件

file_001.csv            file_002.csv
-----------             ----------
11,0,0,0,0,11           22,0,0,0,0,22
0,0,0,0,0,0             0,0,0,0,0,0
0,0,0,0,0,0             0,0,0,0,0,0
11,0,0,0,0,11           22,0,0,0,0,22

file_003.csv            file_004.csv
-----------             ----------
33,0,0,0,0,33           44,0,0,0,0,44
0,0,0,0,0,0             0,0,0,0,0,0
0,0,0,0,0,0             0,0,0,0,0,0
33,0,0,0,0,33           44,0,0,0,0,44
我最后想说的是:

11,0,0,0,0,11,22,0,0,0,0,22
0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0
11,0,0,0,0,11,22,0,0,0,0,22
33,0,0,0,0,33,44,0,0,0,0,44
0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0
33,0,0,0,0,33,44,0,0,0,0,44
上面显示的csv文件仅在4个角处包含值,但在现实生活中不一定如此。我在这里这样做只是为了我们的方便,以便在最终结果中跟踪各个文件

然而,在现实生活中,数据将是稀疏的。此外,每个csv中的阵列大小将为2000-x2000,我将缝合20个穿过的阵列和23个向下的阵列。因此,最终(大稀疏)阵列将为46000×40000

我得出的结论如下。请问有更好(更有效和/或更快)的方法吗?如有任何改进意见,我们将不胜感激

import pandas as pd
import numpy as np
import os
from scipy.sparse import coo_matrix, hstack

def mosaic(csv_root):
    out = None
    tiles_across = 2
    tiles_down = 2
    tiles = []
    step = 0
    for j in range(tiles_down):
        # loop over the rows and for each row stitch all csvs. The keep them in a list
        csvs = [f"file_{int(i+j*tiles_across):03d}.csv" for i in range(tiles_across)]
        filenames = [os.path.join(csv_root, csv) for csv in csvs]
        df = pd.concat([pd.read_csv(f, index_col=False, header=None) for f in filenames], ignore_index=True, axis=1)
        coo = coo_matrix(df.values)
        coo.row += step
        tiles.append(coo)
        step = coo.shape[0] + step

    # concatenate now the elements of the list
    [M, N] = tiles[0].shape
    M = M * len(tiles)  # adjust the coordinates
    _row = np.concatenate([x.row for x in tiles]).ravel().tolist()
    _col = np.concatenate([x.col for x in tiles]).ravel().tolist()
    _data = np.concatenate([x.data for x in tiles]).ravel().tolist()
    out = coo_matrix((_data, (_row, _col)), shape=(M, N))
    print(out)
    return out


if __name__ == "__main__":
    my_dir = os.path.join('my', 'path', 'to', 'csv', 'root')

    mosaic(my_dir)

快速建议-收集列表中的所有
coo
矩阵。然后使用
sparse.bmat
将它们连接到一个稀疏矩阵中。看看它的代码
sparse.hstack
(和
vstack
)也使用
bmat
。啊,是的,这听起来是个好主意。以前没有听说过稀疏.bmat