Python:如何在两个轴上附加(稀疏)二维数组?
我需要从文件系统中某个地方的csv平面文件中读取一些2d矩阵,然后在遍历和向下遍历时对它们进行整理。csv文件包含稀疏数据,假设我只有以下4个文件Python:如何在两个轴上附加(稀疏)二维数组?,python,numpy,scipy,sparse-matrix,Python,Numpy,Scipy,Sparse Matrix,我需要从文件系统中某个地方的csv平面文件中读取一些2d矩阵,然后在遍历和向下遍历时对它们进行整理。csv文件包含稀疏数据,假设我只有以下4个文件 file_001.csv file_002.csv ----------- ---------- 11,0,0,0,0,11 22,0,0,0,0,22 0,0,0,0,0,0 0,0,0,0,0,0 0,0,0,0,0,0 0,0,0,
file_001.csv file_002.csv
----------- ----------
11,0,0,0,0,11 22,0,0,0,0,22
0,0,0,0,0,0 0,0,0,0,0,0
0,0,0,0,0,0 0,0,0,0,0,0
11,0,0,0,0,11 22,0,0,0,0,22
file_003.csv file_004.csv
----------- ----------
33,0,0,0,0,33 44,0,0,0,0,44
0,0,0,0,0,0 0,0,0,0,0,0
0,0,0,0,0,0 0,0,0,0,0,0
33,0,0,0,0,33 44,0,0,0,0,44
我最后想说的是:
11,0,0,0,0,11,22,0,0,0,0,22
0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0
11,0,0,0,0,11,22,0,0,0,0,22
33,0,0,0,0,33,44,0,0,0,0,44
0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0
33,0,0,0,0,33,44,0,0,0,0,44
上面显示的csv文件仅在4个角处包含值,但在现实生活中不一定如此。我在这里这样做只是为了我们的方便,以便在最终结果中跟踪各个文件
然而,在现实生活中,数据将是稀疏的。此外,每个csv中的阵列大小将为2000-x2000,我将缝合20个穿过的阵列和23个向下的阵列。因此,最终(大稀疏)阵列将为46000×40000
我得出的结论如下。请问有更好(更有效和/或更快)的方法吗?如有任何改进意见,我们将不胜感激
import pandas as pd
import numpy as np
import os
from scipy.sparse import coo_matrix, hstack
def mosaic(csv_root):
out = None
tiles_across = 2
tiles_down = 2
tiles = []
step = 0
for j in range(tiles_down):
# loop over the rows and for each row stitch all csvs. The keep them in a list
csvs = [f"file_{int(i+j*tiles_across):03d}.csv" for i in range(tiles_across)]
filenames = [os.path.join(csv_root, csv) for csv in csvs]
df = pd.concat([pd.read_csv(f, index_col=False, header=None) for f in filenames], ignore_index=True, axis=1)
coo = coo_matrix(df.values)
coo.row += step
tiles.append(coo)
step = coo.shape[0] + step
# concatenate now the elements of the list
[M, N] = tiles[0].shape
M = M * len(tiles) # adjust the coordinates
_row = np.concatenate([x.row for x in tiles]).ravel().tolist()
_col = np.concatenate([x.col for x in tiles]).ravel().tolist()
_data = np.concatenate([x.data for x in tiles]).ravel().tolist()
out = coo_matrix((_data, (_row, _col)), shape=(M, N))
print(out)
return out
if __name__ == "__main__":
my_dir = os.path.join('my', 'path', 'to', 'csv', 'root')
mosaic(my_dir)
快速建议-收集列表中的所有
coo
矩阵。然后使用sparse.bmat
将它们连接到一个稀疏矩阵中。看看它的代码sparse.hstack
(和vstack
)也使用bmat
。啊,是的,这听起来是个好主意。以前没有听说过稀疏.bmat