Machine learning Keras多标签图像分类。我是否正确传递数据?我的预处理正确吗?
我陷入了keras多标签问题。我得到了使用自定义数据生成器创建小批量并避免内存问题的提示 我使用的csv文件包含ID、文件名及其相应的标签(总共21个),如下所示:Machine learning Keras多标签图像分类。我是否正确传递数据?我的预处理正确吗?,machine-learning,keras,deep-learning,image-recognition,multilabel-classification,Machine Learning,Keras,Deep Learning,Image Recognition,Multilabel Classification,我陷入了keras多标签问题。我得到了使用自定义数据生成器创建小批量并避免内存问题的提示 我使用的csv文件包含ID、文件名及其相应的标签(总共21个),如下所示: Filename label1 label2 label3 label4 ... ID abc1.jpg 1 0 0 1 ... id-1 def2.jpg 1 0 0 1 ... id-2 ghi3.jpg
Filename label1 label2 label3 label4 ... ID
abc1.jpg 1 0 0 1 ... id-1
def2.jpg 1 0 0 1 ... id-2
ghi3.jpg 1 0 0 1 ... id-3
...
我将ID和标签放入字典中,字典具有以下输出:
partition: {'train': ['id-1','id-2','id-3',...], 'validation': ['id-7','id-14','id-21',...]}
labels: {'id-0': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'id-1': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'id-2': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
...}
我还有一个文件夹,每个图像都保存为一个npy文件,将由下面的自定义数据生成器获取:
import numpy as np
import keras
from keras.layers import *
from keras.models import Sequential
class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, list_IDs, labels, batch_size=32, dim=(224,224), n_channels=3,
n_classes=21, shuffle=True):
'Initialization'
self.dim = dim
self.batch_size = batch_size
self.labels = labels
self.list_IDs = list_IDs
self.n_channels = n_channels
self.n_classes = n_classes
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.list_IDs) / self.batch_size))
def __getitem__(self, index):
'Generate one batch of data'
# Generate indexes of the batch
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
# Generate data
X, y = self.__data_generation(list_IDs_temp)
return X, y
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle == True:
np.random.shuffle(self.indexes)
def __data_generation(self, list_IDs_temp):
'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
# Initialization
X = np.empty((self.batch_size, *self.dim, self.n_channels))
y = np.empty((self.batch_size), dtype=int)
# Generate data
for i, ID in enumerate(list_IDs_temp):
# Store sample
X[i,] = np.load('Folder with npy files/' + ID + '.npy')
# Store class
y[i] = self.labels[ID]
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
到目前为止,我的笔记本没有给我任何错误,但当我执行以下操作时:
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Train model on dataset
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
epochs=5,
use_multiprocessing=True,
workers=2)
我收到如下错误消息:
Filename label1 label2 label3 label4 ... ID
abc1.jpg 1 0 0 1 ... id-1
def2.jpg 1 0 0 1 ... id-2
ghi3.jpg 1 0 0 1 ... id-3
...
线程7中的异常:
回溯(最近一次呼叫最后一次):
文件“c:\users\sebas\appdata\local\programs\python\python36\lib\threading.py”,第916行,在\u bootstrap\u inner中
self.run()
文件“c:\users\sebas\appdata\local\programs\python\python36\lib\multiprocessing\reduce.py”,第60行,转储文件
ForkingPickler(文件、协议).dump(obj)
断管错误:[Errno 32]断管
感觉我传递或使用的数据有点不正确!?
如果有人有想法或提示如何更好地传递数据或解决此问题,我将不胜感激。即使是不同的方法也会很棒。提前感谢您的帮助。
use\u multiprocessing=True
在windows()上不受支持。删除该参数和workers
参数。谢谢,这至少消除了断管错误。现在我得到了一个不同的错误:ValueError:设置一个带有序列的数组元素。我猜我的数据传递不正确!?