Keras配合与配合发电机额外配件_Keras

Keras配合与配合发电机额外配件

keras

Keras配合与配合发电机额外配件,keras,Keras,我有两个张量的训练数据和验证数据。首先，我使用函数运行了一个NN。就我的目的而言，我想转到。我构建了一个生成器，我注意到样本的数量不是批量大小的乘积为了克服这一问题，我采取了以下措施： indices = np.arange(len(dataset))# generate indices of len(dataset) num_of_steps = int(np.ceil(len(dataset)/batch_size)) #number of steps per epoch extra =

我有两个张量的训练数据和验证数据。首先，我使用函数运行了一个NN。就我的目的而言，我想转到。我构建了一个生成器，我注意到样本的数量不是批量大小的乘积

为了克服这一问题，我采取了以下措施：

indices = np.arange(len(dataset))# generate indices of len(dataset)
num_of_steps = int(np.ceil(len(dataset)/batch_size)) #number of steps per epoch
extra = num_of_steps *batch_size-len(dataset)#find the size of extra samples needed to complete the next multiplication of batch_size
additional = np.random.randint(len(dataset),size = extra )#complete with random samples 
indices = np.append(indices ,additional )

在对每个历元的索引进行随机化之后，我只需批量迭代，跳过并汇集正确的数据和标签

我观察到模型的性能下降。使用

fit（）

进行训练时，我的训练精度为0.99，验证精度为0.93；而使用

fit\u生成器（）

时，我的训练精度分别为0.95和0.9。注意，这是一致的，不是一个单一的实验。我认为这可能是因为

fit（）

以不同的方式处理所需的额外样本。我的实施是否合理？

fit（）

如何处理与

batch\u大小不同的数据集
共享完整的生成器代码：
    def generator(self,batch_size,train):
        """
        Generates batches of samples
        :return: 
        """
        while 1:
            nb_of_steps=0
            if(train):    
                nb_of_steps = self._num_of_steps_train
                indices = np.arange(len(self._x_train))
                additional = np.random.randint(len(self._x_train), size=self._num_of_steps_train*batch_size-len(self._x_train))
            else:
                nb_of_steps = self._num_of_steps_test
                indices = np.arange(len(self._x_test))
                additional = np.random.randint(len(self._x_test), size=self._num_of_steps_test*batch_size-len(self._x_test))

            indices = np.append(indices,additional)
            np.random.shuffle(indices)
#            print(indices.shape)
#            print(nb_of_steps)

            for i in range(nb_of_steps):
                batch_indices=indices[i:i+batch_size]
                if(train):
                    feat = self._x_train[batch_indices]
                    label = self._y_train[batch_indices]
                else:
                    feat = self._x_test[batch_indices]
                    label = self._y_test[batch_indices]
                feat = np.expand_dims(feat,axis=1)
#                print(feat.shape)
#                print(label.shape)

                yield feat, label      

看起来您可以显著简化生成器！
步骤数等可以在循环之外设置，因为它们实际上不会改变。此外，batch_索引
似乎没有遍历整个数据集。最后，如果您的数据适合内存，您可能根本不需要生成器，但这将由您自己判断
def generator(self, batch_size, train):
    nb_of_steps = 0
    if (train):
        nb_of_steps = self._num_of_steps_train
        indices = np.arange(len(self._x_train)) #len of entire dataset
    else:
        nb_of_steps = self._num_of_steps_test
        indices = np.arange(len(self._x_test))
    while 1:
        np.random.shuffle(indices)
        for i in range(nb_of_steps):
            start_idx = i*batch_size
            end_idx = min(i*batch_size+batch_size, len(indices))
            batch_indices=indices[start_idx : end_idx]
            if(train):
                feat = self._x_train[batch_indices]
                label = self._y_train[batch_indices]
            else:
                feat = self._x_test[batch_indices]
                label = self._y_test[batch_indices]
            feat = np.expand_dims(feat,axis=1)
            yield feat, label 

对于一个更健壮的生成器，考虑使用<代码> Kalas.UTIL.Stult为集合创建一个类。它将增加一些额外的代码行，但它肯定与keras一起工作。
当你说降级时，你是指速度吗？这是有道理的，因为从磁盘读写是您能做的最慢的操作。你也可以分享你的发电机代码吗？这将更容易理解问题所在@尼克，请看我的编辑。在几秒钟内共享生成器代码。此实现在计算时间方面非常有用。谢谢。然而，我问的是网络作为分类器的性能，而不是计算时间。你也能帮忙吗？我认为分类器的性能是因为生成器没有遍历整个数据集（因此，您的fit_生成器
在数据子集上进行了训练。不幸的是，如果没有实际数据集，很难测试理论。不同的开始/结束索引应该可以解决此问题，但如果没有，请告诉我：start_idx=i*batch_size
end_idx=min（i*batch_size+batch_size，len（索引））

batch\u index=index[start\u idx:end\u idx]

这确实提高了性能。因此，如果我正确阅读了您的实现，您只给出了一个较短的最后一批？确实较短的最后一批，但现在批索引也适当增加了；以前的外观并没有贯穿整个数据集。具体来说，在循环执行这些步骤时，如果

I=0

在

batch\u index=0:batch\u size

中，当

i=1

时，

batch\u index=1:1+batch\u size

。通过更改开始和结束索引，

batch\u index

在每个循环上移动时，元素不会重叠