Python 如何正确设置用于训练Keras模型的数据集_Python_Tensorflow_Keras

Python 如何正确设置用于训练Keras模型的数据集

python tensorflow keras

Python 如何正确设置用于训练Keras模型的数据集,python,tensorflow,keras,Python,Tensorflow,Keras,我正试图用一个简单的Keras序列模型创建一个用于音频识别的数据集这是我用来创建模型的函数： def dnn_model(input_shape, output_shape): model = keras.Sequential() model.add(keras.Input(input_shape)) model.add(layers.Flatten()) model.add(layers.Dense(512, activation = "relu&q

我正试图用一个简单的Keras序列模型创建一个用于音频识别的数据集

这是我用来创建模型的函数：

def dnn_model(input_shape, output_shape):
    model = keras.Sequential()
    model.add(keras.Input(input_shape))
    model.add(layers.Flatten())
    model.add(layers.Dense(512, activation = "relu"))
    model.add(layers.Dense(output_shape, activation = "softmax"))
    model.compile(  optimizer='adam',
                    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True), 
                    metrics=['acc'])

    model.summary()
    
    return model

我正在使用此生成器功能生成培训数据：

def generator(x_dirs, y_dirs, hmm, sampling_rate, parameters):
    window_size_samples = tools.sec_to_samples(parameters['window_size'], sampling_rate)    
    window_size_samples = 2**tools.next_pow2(window_size_samples) 
    hop_size_samples = tools.sec_to_samples(parameters['hop_size'],sampling_rate)

    for i in range(len(x_dirs)):
        features  = fe.compute_features_with_context(x_dirs[i],**parameters)
        praat = tools.praat_file_to_target( y_dirs[i],
                                            sampling_rate,
                                            window_size_samples,
                                            hop_size_samples,
                                            hmm)
        yield features,praat

变量x_dirs和y_dirs包含指向标签和音频文件的路径列表。我总共得到了8623个文件来训练我的模型。这是我训练我的模特的方式：

def train_model(model, model_dir, x_dirs, y_dirs, hmm, sampling_rate, parameters, steps_per_epoch=10,epochs=10):

    model.fit((generator(x_dirs, y_dirs, hmm, sampling_rate, parameters)),
                            epochs=epochs,
                            batch_size=steps_per_epoch)
    return model

我现在的问题是，如果我通过所有8623文件，它将使用所有8623文件在第一个历元中训练模型，并在第一个历元后抱怨它需要每个历元的步骤*历元批次来训练模型

我只对8623个文件中的10个进行了测试，其中有一个切片列表，但than Tensorflow抱怨需要100个批次

那么，我如何让我的生成器输出其工作最佳的数据呢？我一直认为每个历元的步长只会限制每个历元接收的数据。

拟合函数将耗尽生成器，也就是说，一旦生成了所有8623批次，它将无法再生成批次

您希望这样解决问题：

def generatorx_dirs，y_dirs，hmm，采样率，参数，历元=1：对于范围内的历元历元：或为True时： window_size_samples=tools.sec_to_samplesparameters['window_size']，采样率窗口大小样本=2**工具。下一个窗口大小样本 hop_size_samples=tools.sec_to_samples参数['hop_size']，采样率对于rangelenx_目录中的i： features=fe.compute\u features\u with\u contextx\u dirs[i]，**参数 praat=tools.praat\u文件到目标y\u目录[i]，抽样率，窗口大小样本，啤酒花大小样品，隐马尔可夫模型产量特征

我只是用10个列表对它进行了测试，虽然是真的：因为我无法更改该方法所采用的参数。拟合只会超出10个数据集，我想永远继续下去？这就是你每一个历元的步骤介入的地方：拟合方法会在通过每一个历元的步骤后停止每一个历元。天哪。我觉得我的打字打错了。我有批次=每一个步骤