Python 二进制类单输出keras中的K-折叠交叉验证
我正在使用一个卷积神经网络对猫和狗进行分类,这两个类只有一个输出。我需要使用k-fold交叉验证来找出哪一套或哪一种宠物品种的验证精度最好。与我的问题最接近的答案是这个问题:,但它显然没有使用原始的网络模型,也不适用于不同品种的宠物群 在第1组、第2组和第3组中,我有两个名为“宠物”的文件夹,在每个“宠物”文件夹中,我有两个属于我的类的文件夹:猫和狗: 例如:Python 二进制类单输出keras中的K-折叠交叉验证,python,python-3.x,tensorflow,machine-learning,keras,Python,Python 3.x,Tensorflow,Machine Learning,Keras,我正在使用一个卷积神经网络对猫和狗进行分类,这两个类只有一个输出。我需要使用k-fold交叉验证来找出哪一套或哪一种宠物品种的验证精度最好。与我的问题最接近的答案是这个问题:,但它显然没有使用原始的网络模型,也不适用于不同品种的宠物群 在第1组、第2组和第3组中,我有两个名为“宠物”的文件夹,在每个“宠物”文件夹中,我有两个属于我的类的文件夹:猫和狗: 例如: Group 1/ Pets 1/ cats/ breeds_1_cats001
Group 1/
Pets 1/
cats/
breeds_1_cats001.jpeg
breeds_1_cats002.jpeg
dogs/
breeds_1_dogs001.jpeg
breeds_1_dogs002.jpeg
Pets 2/
cats/
breeds_2_cats001.jpeg
breeds_2_cats002.jpeg
dogs/
breeds_2_dogs001.jpeg
breeds_2_dogs002.jpeg
Group 2/
Pets 1/
cats/
breeds_3_cats001.jpeg
breeds_3_cats002.jpeg
dogs/
breeds_3_dogs001.jpeg
breeds_3_dogs002.jpeg
Pets 2/
cats/
breeds_4_cats001.jpeg
breeds_4_cats002.jpeg
dogs/
breeds_4_dogs001.jpeg
breeds_4_dogs002.jpeg
Group 3/
Pets 1/
cats/
breeds_5_cats001.jpeg
breeds_5_cats002.jpeg
dogs/
breeds_5_dogs001.jpeg
breeds_5_dogs002.jpeg
Pets 2/
cats/
breeds_6_cats001.jpeg
breeds_6_cats002.jpeg
dogs/
breeds_6_dogs001.jpeg
breeds_6_dogs002.jpeg
我想做的是使用kfold和have作为我的组的索引
例如:使用第1组和第2组作为培训,第3组作为验证。
然后,第1组和第3组作为培训组,第2组作为验证组,最后使用第2组和第3组作为培训组,第1组作为验证组
我用一个假人来解释我的目标
我的问题是,我不知道如何在一个嵌套文件夹中为给定的多个组使用k-fold,该文件夹中包含二进制类,我使用数据生成器对二进制输出进行训练和测试。
我需要对我的卷积神经网络使用k-fold,而不必修改我的数据扩充或破坏我的层,为了找到最佳验证精度并保存它们的权重,我的神经网络如下:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import numpy as np
from keras.preprocessing import image
img_width, img_height = 128, 160
train_data_dir = '../input/pets/pets train'
validation_data_dir = '../input/pets/pets testing'
nb_train_samples = 4850
nb_validation_Samples = 3000
epochs = 100
batch_size = 16
if K.image_data_format() == 'channels_first':
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(
zoom_range=0.2,
rotation_range=40,
horizontal_flip=True,
)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode="binary")
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse','accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data = validation_generator,
validation_steps = nb_validation_Samples // batch_size)
model.save_weights('pets-weights.npy')
您将无法使用
ImageDataGenerator
,因为根据文档,需要一组形状(n_示例,功能)
你可以做什么呢?将你的图像加载到内存中,并创建一个自定义的CV拆分器。我有这样一个文件夹:
group1/
cats/
breeds_5_cats001.jpeg
breeds_5_cats002.jpeg
dogs/
breeds_4_dogs001.jpeg
breeds_4_dogs002.jpeg
group2/
cats/
breeds_5_cats001.jpeg
breeds_5_cats002.jpeg
dogs/
breeds_4_dogs001.jpeg
breeds_4_dogs002.jpeg
group3/
cats/
breeds_5_cats001.jpeg
breeds_5_cats002.jpeg
dogs/
breeds_4_dogs001.jpeg
breeds_4_dogs002.jpeg
我首先对文件名进行全局搜索并对它们进行分组。您需要稍微更改glob模式,因为我的目录结构略有不同。它所需要做的就是得到所有的图片,不管顺序如何
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')
filenames = glob('*/*/*.jpg')
groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]
然后我将所有图片加载到一个数组中,并为类别创建一个0和1的数组。您需要根据您的目录结构对此进行自定义
images = list()
for image in filenames:
array = cv2.imread(image)/255
resized = cv2.resize(array, (32, 32))
images.append(resized)
X = np.array(images).astype(np.float32)
y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)
然后我构建了一个KerasClassifier
:
def build_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse', 'accuracy'])
return model
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
然后我制作了一个自定义CV拆分器,如下所述:
输出:
[0.648 0.666 0.73 ]
完整代码:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')
filenames = glob('*/*/*.jpg')
groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]
images = list()
for image in filenames:
array = cv2.imread(image)/255
resized = cv2.resize(array, (32, 32))
images.append(resized)
X = np.array(images).astype(np.float32)
y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)
def build_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse', 'accuracy'])
return model
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
def three_fold_cv():
i = 1
while i <= 3:
min_length = lengths[i - 1]
max_length = lengths[i]
idx = np.arange(min_length, max_length, dtype=int)
yield idx, idx
i += 1
tfc = three_fold_cv()
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=tfc)
print(accuracies)
以下是MNIST数据集的复制/可复制示例:
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from itertools import accumulate
import tensorflow as tf
# Here's your dataset.
(xtrain, ytrain), (_, _) = tf.keras.datasets.mnist.load_data()
# You have three groups, as you wanted. They are 20,000 each.
x_group1, y_group1 = xtrain[:20_000], ytrain[:20_000]
x_group2, y_group2 = xtrain[20_000:40_000:], ytrain[20_000:40_000:]
x_group3, y_group3 = xtrain[40_000:60_000], ytrain[40_000:60_000]
# You need the accumulated lengths of the datasets: [0, 20000, 40000, 60000]
lengths = [0] + list(accumulate(map(len, [y_group1, y_group2, y_group3])))
# Now you need all three in a single dataset.
X = np.concatenate([x_group1, x_group2, x_group3], axis=0)[..., np.newaxis]
y = np.concatenate([y_group1, y_group2, y_group3], axis=0)
# KerasClassifier needs a model building function.
def build_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse', 'accuracy'])
return model
# Creating the KerasClassifier.
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
# Creating the custom Cross-validation splitter. Splits are based on `lengths`.
def three_fold_cv():
i = 1
while i <= 3:
min_length = lengths[i - 1]
max_length = lengths[i]
idx = np.arange(min_length, max_length, dtype=int)
yield idx, idx
i += 1
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=three_fold_cv())
print(accuracies)
来自tensorflow.keras.wrappers.scikit\u了解导入KerasClassifier
从sklearn.model_选择导入交叉值_分数
将numpy作为np导入
从tensorflow.keras.layers导入*
从tensorflow.keras导入顺序
从itertools导入
导入tensorflow作为tf
#这是您的数据集。
(xtrain,ytrain),(,)=tf.keras.datasets.mnist.load_data()
#按照你的要求,你有三个小组。每人两万英镑。
x_组1,y_组1=xtrain[:20_000],ytrain[:20_000]
x_group2,y_group2=xtrain[20_000:40_000:],ytrain[20_000:40_000:]
x_group3,y_group3=xtrain[40_000:60_000],ytrain[40_000:60_000]
#您需要数据集的累积长度:[0,20000,40000,60000]
长度=[0]+列表(累加(映射(len,[y_组1,y_组2,y_组3]))
#现在,您需要在一个数据集中使用这三种数据。
X=np.concatenate([X_group1,X_group2,X_group3],axis=0)[……,np.newaxis]
y=np.连接([y_组1,y_组2,y_组3],轴=0)
#KerasClassifier需要一个模型构建功能。
def build_模型():
模型=顺序()
add(Conv2D(32,(3,3),input_shape=(28,28,1)))
添加(激活('relu'))
add(MaxPooling2D(池大小=(2,2)))
model.add(展平())
模型。添加(辍学率(0.25))
模型.添加(密度(64))
模型。添加(辍学率(0.5))
模型.添加(密度(1))
添加(激活('sigmoid'))
model.summary()
model.compile(loss='binary\u crossentropy',
优化器='rmsprop',
指标=['mse','Accurance'])
回归模型
#创建KerasClassifier。
keras\u clf=KerasClassifier(build\u fn=build\u model,历代数=1,批量大小=16,详细度=0)
#创建自定义交叉验证拆分器。拆分基于“长度”。
def三次折叠cv():
i=1
我还可以举一个玩具数据集的例子,如果文件夹太混乱,如果不会太麻烦,我会很感激。那么,在将狗和猫加载到内存之前,我是否应该将它们放在同一个组中,并分配标签?比如:“dataset/group_1/cat1.jpg,dog1.jpg”然后为每个组分配y_1=[0,1]?我不清楚在为cv函数分配索引时,应该如何为每个组加载类分隔是的,它们不需要在同一组中。您只需要3组文件名,全部混合。
[0.648 0.666 0.73 ]
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')
filenames = glob('*/*/*.jpg')
groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]
images = list()
for image in filenames:
array = cv2.imread(image)/255
resized = cv2.resize(array, (32, 32))
images.append(resized)
X = np.array(images).astype(np.float32)
y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)
def build_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse', 'accuracy'])
return model
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
def three_fold_cv():
i = 1
while i <= 3:
min_length = lengths[i - 1]
max_length = lengths[i]
idx = np.arange(min_length, max_length, dtype=int)
yield idx, idx
i += 1
tfc = three_fold_cv()
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=tfc)
print(accuracies)
[0.648 0.666 0.73 ]
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from itertools import accumulate
import tensorflow as tf
# Here's your dataset.
(xtrain, ytrain), (_, _) = tf.keras.datasets.mnist.load_data()
# You have three groups, as you wanted. They are 20,000 each.
x_group1, y_group1 = xtrain[:20_000], ytrain[:20_000]
x_group2, y_group2 = xtrain[20_000:40_000:], ytrain[20_000:40_000:]
x_group3, y_group3 = xtrain[40_000:60_000], ytrain[40_000:60_000]
# You need the accumulated lengths of the datasets: [0, 20000, 40000, 60000]
lengths = [0] + list(accumulate(map(len, [y_group1, y_group2, y_group3])))
# Now you need all three in a single dataset.
X = np.concatenate([x_group1, x_group2, x_group3], axis=0)[..., np.newaxis]
y = np.concatenate([y_group1, y_group2, y_group3], axis=0)
# KerasClassifier needs a model building function.
def build_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))
model.add(Dense(64))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['mse', 'accuracy'])
return model
# Creating the KerasClassifier.
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
# Creating the custom Cross-validation splitter. Splits are based on `lengths`.
def three_fold_cv():
i = 1
while i <= 3:
min_length = lengths[i - 1]
max_length = lengths[i]
idx = np.arange(min_length, max_length, dtype=int)
yield idx, idx
i += 1
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=three_fold_cv())
print(accuracies)