用Keras/Tensorflow实现稀疏数据发生器我用C++实现了一个C++网络，我尝试用GPU和Python一起训练它。我面临的问题是，输入非常大（而且稀疏），大约有50000个输入神经元，其中通常只有30个被激活_Python_Tensorflow_Keras_Sparse Matrix

用Keras/Tensorflow实现稀疏数据发生器我用C++实现了一个C++网络，我尝试用GPU和Python一起训练它。我面临的问题是，输入非常大（而且稀疏），大约有50000个输入神经元，其中通常只有30个被激活

python tensorflow keras

用Keras/Tensorflow实现稀疏数据发生器我用C++实现了一个C++网络，我尝试用GPU和Python一起训练它。我面临的问题是，输入非常大（而且稀疏），大约有50000个输入神经元，其中通常只有30个被激活,python,tensorflow,keras,sparse-matrix,Python,Tensorflow,Keras,Sparse Matrix,我的模型如下所示： __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==========================================

我的模型如下所示：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          6291712     input_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          6291712     input_2[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 256)          0           dense_1[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 256)          0           dense_2[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           leaky_re_lu_1[0][0]              
                                                                 leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 32)           16416       concatenate_1[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 32)           0           dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           1056        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 1)            33          leaky_re_lu_4[0][0]              
==================================================================================================
Total params: 12,600,929
Trainable params: 12,600,929
Non-trainable params: 0

# loads both the inputs and the output for the given chunk (100000 inputs/outputs) from the memory
trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

for chunk in chunks:
        trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

        _res = model.fit([trainX1,trainX2], trainY, epochs=1,steps_per_epoch=1,verbose=0)
        loss = list(_res.history.values())[0]
        totalLoss += loss[0]

我还获得了大约3亿个输入/输出值，我正试图将其输入到我的网络中。不用说，这些数据太多了，无法一次全部安装到我的GPU上

为了提高速度，我生成了稀疏矩阵，每个矩阵表示大约100000个输入，并将它们保存在内存中（大约50Gb）。我可以很容易地加载它们，而不会像这样损失很多速度：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          6291712     input_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          6291712     input_2[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 256)          0           dense_1[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 256)          0           dense_2[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           leaky_re_lu_1[0][0]              
                                                                 leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 32)           16416       concatenate_1[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 32)           0           dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           1056        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 1)            33          leaky_re_lu_4[0][0]              
==================================================================================================
Total params: 12,600,929
Trainable params: 12,600,929
Non-trainable params: 0

# loads both the inputs and the output for the given chunk (100000 inputs/outputs) from the memory
trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

for chunk in chunks:
        trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

        _res = model.fit([trainX1,trainX2], trainY, epochs=1,steps_per_epoch=1,verbose=0)
        loss = list(_res.history.values())[0]
        totalLoss += loss[0]

我用它来训练我的人际网络，如下所示：

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            (None, 24576)        0                                            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 256)          6291712     input_1[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 256)          6291712     input_2[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 256)          0           dense_1[0][0]                    
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 256)          0           dense_2[0][0]                    
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 512)          0           leaky_re_lu_1[0][0]              
                                                                 leaky_re_lu_2[0][0]              
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 32)           16416       concatenate_1[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 32)           0           dense_3[0][0]                    
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           1056        leaky_re_lu_3[0][0]              
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 1)            33          leaky_re_lu_4[0][0]              
==================================================================================================
Total params: 12,600,929
Trainable params: 12,600,929
Non-trainable params: 0

# loads both the inputs and the output for the given chunk (100000 inputs/outputs) from the memory
trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

for chunk in chunks:
        trainX1,trainX2,trainY = readNumpyChunkAndCreateInput(chunk)

        _res = model.fit([trainX1,trainX2], trainY, epochs=1,steps_per_epoch=1,verbose=0)
        loss = list(_res.history.values())[0]
        totalLoss += loss[0]

显然，这无论如何都不是最优的。我知道Keras/TensorFlow中有一种叫做

数据生成器的东西，但遗憾的是，我不知道如何在我的具体案例中使用它们，因为所有教程都涉及密集输入。
如果有人能帮助我，我很高兴
您好，
芬兰人
编辑1
加载数据的方式：
filePath = os.path.abspath(os.path.dirname(sys.argv[0]))
    path = filePath + "\\data\\" + name + "\\"

    indices1 = np.load(path + 'indices1.npy')
    indices2 = np.load(path + 'indices2.npy')
    outputs = np.load(path + 'outputs.npy')

    meta = open(path + 'meta.txt', "r")
    metaInf = meta.readlines()[0].split(" ")
    meta.close()

    entry1Count = int(metaInf[0])
    entry2Count = int(metaInf[1])
    lineCount = int(metaInf[2])

    values1 = tf.ones(entry1Count)
    values2 = tf.ones(entry2Count)

    shape = (lineCount, 6 * 64 * 64)

    trainX1 = tf.SparseTensor(
        indices=indices1,
        values=values1,
        dense_shape=shape
    )

    trainX2 = tf.SparseTensor(
        indices=indices2,
        values=values2,
        dense_shape=shape
    )

    return trainX1, trainX2, outputs

我已经编写了一个小的生成器函数，您可以根据您的用例进行调整
import os
def gen():
    paths = os.listdir('temp_data') # path of the directory
    for path in paths:
        file_path = os.path.join('temp_data',path)
        x = np.load(file_path)
        y = np.load(file_path),
        z = np.load(file_path)
        # Your logic
        #
        #
        #
        
        yield (x,y,z) # Three tensors/numpy arrays. In your case trainx1, trainx2, outputs.

在tf.data.Dataset中使用生成器的代码：
dataset = tf.data.Dataset.from_generator(gen, (tf.float32, tf.float32,tf.float32))
dataset = dataset.prefetch(2)

预回迁允许提前存储下一批，以消除任何延迟。
您可以使用此数据集传递给fit命令，也可以像这样使用自定义训练循环
epochs = 100
for epoch in range(epochs):
    print("\nStart of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x1_batch_train, x2_batch_train, y_batch_train) in enumerate(train_dataset):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables auto-differentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model([x1_batch_train,x2_batch_train], training=True)  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(loss_value, model.trainable_weights)

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(zip(grads, model.trainable_weights))

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %d: %.4f"
                % (step, float(loss_value))
            )
            print("Seen so far: %s samples" % ((step + 1) * 64))

磁盘上存储的数据格式是什么？它是一个稀疏的numpy数组（事实上两个输入都是2个）。它是存储为一个50 GB的.npy
文件还是多个文件？我将300米的数据集分割成更小的块，每个块有100k个数据集。100k大约是20MB。在我开车的时候看起来是这样的：我只是做了更多的研究。在这种情况下，“批量培训”能起作用吗？