Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/jsf/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 二进制类单输出keras中的K-折叠交叉验证_Python_Python 3.x_Tensorflow_Machine Learning_Keras - Fatal编程技术网

Python 二进制类单输出keras中的K-折叠交叉验证

Python 二进制类单输出keras中的K-折叠交叉验证,python,python-3.x,tensorflow,machine-learning,keras,Python,Python 3.x,Tensorflow,Machine Learning,Keras,我正在使用一个卷积神经网络对猫和狗进行分类,这两个类只有一个输出。我需要使用k-fold交叉验证来找出哪一套或哪一种宠物品种的验证精度最好。与我的问题最接近的答案是这个问题:,但它显然没有使用原始的网络模型,也不适用于不同品种的宠物群 在第1组、第2组和第3组中,我有两个名为“宠物”的文件夹,在每个“宠物”文件夹中,我有两个属于我的类的文件夹:猫和狗: 例如: Group 1/ Pets 1/ cats/ breeds_1_cats001

我正在使用一个卷积神经网络对猫和狗进行分类,这两个类只有一个输出。我需要使用k-fold交叉验证来找出哪一套或哪一种宠物品种的验证精度最好。与我的问题最接近的答案是这个问题:,但它显然没有使用原始的网络模型,也不适用于不同品种的宠物群

在第1组、第2组和第3组中,我有两个名为“宠物”的文件夹,在每个“宠物”文件夹中,我有两个属于我的类的文件夹:猫和狗: 例如:

Group 1/
       Pets 1/
          cats/
            breeds_1_cats001.jpeg
            breeds_1_cats002.jpeg
     
           dogs/
            breeds_1_dogs001.jpeg
            breeds_1_dogs002.jpeg
      Pets 2/
          cats/
            breeds_2_cats001.jpeg
            breeds_2_cats002.jpeg
     
          dogs/
            breeds_2_dogs001.jpeg
            breeds_2_dogs002.jpeg
Group 2/
      Pets 1/
          cats/
            breeds_3_cats001.jpeg
            breeds_3_cats002.jpeg
           
          dogs/
            breeds_3_dogs001.jpeg
            breeds_3_dogs002.jpeg
      Pets 2/
          cats/
            breeds_4_cats001.jpeg
            breeds_4_cats002.jpeg
           
           dogs/
            breeds_4_dogs001.jpeg
            breeds_4_dogs002.jpeg
Group 3/
       Pets 1/
          cats/
            breeds_5_cats001.jpeg
            breeds_5_cats002.jpeg
           
          dogs/
            breeds_5_dogs001.jpeg
            breeds_5_dogs002.jpeg
      Pets 2/
          cats/
            breeds_6_cats001.jpeg
            breeds_6_cats002.jpeg
           
          dogs/
            breeds_6_dogs001.jpeg
            breeds_6_dogs002.jpeg
                  
我想做的是使用kfold和have作为我的组的索引

例如:使用第1组和第2组作为培训,第3组作为验证。 然后,第1组和第3组作为培训组,第2组作为验证组,最后使用第2组和第3组作为培训组,第1组作为验证组

我用一个假人来解释我的目标

我的问题是,我不知道如何在一个嵌套文件夹中为给定的多个组使用k-fold,该文件夹中包含二进制类,我使用数据生成器对二进制输出进行训练和测试。 我需要对我的卷积神经网络使用k-fold,而不必修改我的数据扩充或破坏我的层,为了找到最佳验证精度并保存它们的权重,我的神经网络如下:

        from keras.preprocessing.image import ImageDataGenerator
        from keras.models import Sequential
        from keras.layers import Conv2D, MaxPooling2D
        from keras.layers import Activation, Dropout, Flatten, Dense
        from keras import backend as K
        import numpy as np
        from keras.preprocessing import image
    
    img_width, img_height = 128, 160
    
    
    train_data_dir = '../input/pets/pets train'
    validation_data_dir = '../input/pets/pets testing'
    nb_train_samples = 4850 
    nb_validation_Samples = 3000 
    epochs = 100
    batch_size = 16
    
    
    if K.image_data_format() == 'channels_first':
       input_shape = (3, img_width, img_height)
    else:
       input_shape = (img_width, img_height, 3)
    
    train_datagen = ImageDataGenerator(
        zoom_range=0.2,
        rotation_range=40,
       horizontal_flip=True,
    )
    
    test_datagen = ImageDataGenerator(rescale=1./255)
    
        train_generator = train_datagen.flow_from_directory(
           train_data_dir,
           target_size=(img_width, img_height),
           batch_size=batch_size,
           class_mode='binary')
    
        validation_generator = test_datagen.flow_from_directory(
            validation_data_dir,
            target_size=(img_width, img_height),
            batch_size=batch_size,
            class_mode="binary")

model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dropout(0.25))


model.add(Dense(64))
model.add(Dropout(0.5))


model.add(Dense(1))
model.add(Activation('sigmoid'))
model.summary()

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['mse','accuracy'])


model.fit_generator(
    train_generator,
    steps_per_epoch=nb_train_samples // batch_size,
    epochs=epochs,
    validation_data = validation_generator,
    validation_steps = nb_validation_Samples // batch_size)

model.save_weights('pets-weights.npy')

您将无法使用
ImageDataGenerator
,因为根据文档,需要一组形状
(n_示例,功能)

你可以做什么呢?将你的图像加载到内存中,并创建一个自定义的CV拆分器。我有这样一个文件夹:

group1/
    cats/
        breeds_5_cats001.jpeg
        breeds_5_cats002.jpeg
    dogs/
        breeds_4_dogs001.jpeg
        breeds_4_dogs002.jpeg
group2/
    cats/
        breeds_5_cats001.jpeg
        breeds_5_cats002.jpeg
    dogs/
        breeds_4_dogs001.jpeg
        breeds_4_dogs002.jpeg
group3/
    cats/
        breeds_5_cats001.jpeg
        breeds_5_cats002.jpeg
    dogs/
        breeds_4_dogs001.jpeg
        breeds_4_dogs002.jpeg
我首先对文件名进行全局搜索并对它们进行分组。您需要稍微更改glob模式,因为我的目录结构略有不同。它所需要做的就是得到所有的图片,不管顺序如何

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')

filenames = glob('*/*/*.jpg')

groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]
然后我将所有图片加载到一个数组中,并为类别创建一个0和1的数组。您需要根据您的目录结构对此进行自定义

images = list()

for image in filenames:
    array = cv2.imread(image)/255
    resized = cv2.resize(array, (32, 32))
    images.append(resized)

X = np.array(images).astype(np.float32)

y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)
然后我构建了一个
KerasClassifier

def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(64))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['mse', 'accuracy'])
    return model


keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)
然后我制作了一个自定义CV拆分器,如下所述:

输出:

[0.648 0.666 0.73 ]
完整代码:

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')

filenames = glob('*/*/*.jpg')

groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]


images = list()

for image in filenames:
    array = cv2.imread(image)/255
    resized = cv2.resize(array, (32, 32))
    images.append(resized)

X = np.array(images).astype(np.float32)

y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)


def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(64))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['mse', 'accuracy'])
    return model


keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)

def three_fold_cv():
    i = 1
    while i <= 3:
        min_length = lengths[i - 1]
        max_length = lengths[i]
        idx = np.arange(min_length, max_length, dtype=int)
        yield idx, idx
        i += 1

tfc = three_fold_cv()
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=tfc)

print(accuracies)
以下是MNIST数据集的复制/可复制示例:

from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from itertools import accumulate
import tensorflow as tf

# Here's your dataset.
(xtrain, ytrain), (_, _) = tf.keras.datasets.mnist.load_data()

# You have three groups, as you wanted. They are 20,000 each.
x_group1, y_group1 = xtrain[:20_000], ytrain[:20_000]
x_group2, y_group2 = xtrain[20_000:40_000:], ytrain[20_000:40_000:]
x_group3, y_group3 = xtrain[40_000:60_000], ytrain[40_000:60_000]

# You need the accumulated lengths of the datasets: [0, 20000, 40000, 60000]
lengths = [0] + list(accumulate(map(len, [y_group1, y_group2, y_group3])))

# Now you need all three in a single dataset.
X = np.concatenate([x_group1, x_group2, x_group3], axis=0)[..., np.newaxis]
y = np.concatenate([y_group1, y_group2, y_group3], axis=0)


# KerasClassifier needs a model building function.
def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(64))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['mse', 'accuracy'])
    return model


# Creating the KerasClassifier.
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)


# Creating the custom Cross-validation splitter. Splits are based on `lengths`.
def three_fold_cv():
    i = 1
    while i <= 3:
        min_length = lengths[i - 1]
        max_length = lengths[i]
        idx = np.arange(min_length, max_length, dtype=int)
        yield idx, idx
        i += 1

accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=three_fold_cv())

print(accuracies)
来自tensorflow.keras.wrappers.scikit\u了解导入KerasClassifier
从sklearn.model_选择导入交叉值_分数
将numpy作为np导入
从tensorflow.keras.layers导入*
从tensorflow.keras导入顺序
从itertools导入
导入tensorflow作为tf
#这是您的数据集。
(xtrain,ytrain),(,)=tf.keras.datasets.mnist.load_data()
#按照你的要求,你有三个小组。每人两万英镑。
x_组1,y_组1=xtrain[:20_000],ytrain[:20_000]
x_group2,y_group2=xtrain[20_000:40_000:],ytrain[20_000:40_000:]
x_group3,y_group3=xtrain[40_000:60_000],ytrain[40_000:60_000]
#您需要数据集的累积长度:[0,20000,40000,60000]
长度=[0]+列表(累加(映射(len,[y_组1,y_组2,y_组3]))
#现在,您需要在一个数据集中使用这三种数据。
X=np.concatenate([X_group1,X_group2,X_group3],axis=0)[……,np.newaxis]
y=np.连接([y_组1,y_组2,y_组3],轴=0)
#KerasClassifier需要一个模型构建功能。
def build_模型():
模型=顺序()
add(Conv2D(32,(3,3),input_shape=(28,28,1)))
添加(激活('relu'))
add(MaxPooling2D(池大小=(2,2)))
model.add(展平())
模型。添加(辍学率(0.25))
模型.添加(密度(64))
模型。添加(辍学率(0.5))
模型.添加(密度(1))
添加(激活('sigmoid'))
model.summary()
model.compile(loss='binary\u crossentropy',
优化器='rmsprop',
指标=['mse','Accurance'])
回归模型
#创建KerasClassifier。
keras\u clf=KerasClassifier(build\u fn=build\u model,历代数=1,批量大小=16,详细度=0)
#创建自定义交叉验证拆分器。拆分基于“长度”。
def三次折叠cv():
i=1

我还可以举一个玩具数据集的例子,如果文件夹太混乱,如果不会太麻烦,我会很感激。那么,在将狗和猫加载到内存之前,我是否应该将它们放在同一个组中,并分配标签?比如:“dataset/group_1/cat1.jpg,dog1.jpg”然后为每个组分配y_1=[0,1]?我不清楚在为cv函数分配索引时,应该如何为每个组加载类分隔是的,它们不需要在同一组中。您只需要3组文件名,全部混合。
[0.648 0.666 0.73 ]
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
import os
from glob2 import glob
from itertools import groupby
from itertools import accumulate
import cv2
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
tf.config.experimental.list_physical_devices('GPU')
os.chdir('c:/users/nicol/documents/datasets/catsanddogs')

filenames = glob('*/*/*.jpg')

groups = [list(v) for k, v in groupby(sorted(filenames), key=lambda x: x.split(os.sep)[0])]
lengths = [0] + list(accumulate(map(len, groups)))
groups = [i for s in groups for i in s]


images = list()

for image in filenames:
    array = cv2.imread(image)/255
    resized = cv2.resize(array, (32, 32))
    images.append(resized)

X = np.array(images).astype(np.float32)

y = np.array(list(map(lambda x: x.split(os.sep)[1] == 'cats', groups))).astype(int)


def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(32, 32, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(64))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['mse', 'accuracy'])
    return model


keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)

def three_fold_cv():
    i = 1
    while i <= 3:
        min_length = lengths[i - 1]
        max_length = lengths[i]
        idx = np.arange(min_length, max_length, dtype=int)
        yield idx, idx
        i += 1

tfc = three_fold_cv()
accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=tfc)

print(accuracies)
[0.648 0.666 0.73 ]
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from tensorflow.keras.layers import *
from tensorflow.keras import Sequential
from itertools import accumulate
import tensorflow as tf

# Here's your dataset.
(xtrain, ytrain), (_, _) = tf.keras.datasets.mnist.load_data()

# You have three groups, as you wanted. They are 20,000 each.
x_group1, y_group1 = xtrain[:20_000], ytrain[:20_000]
x_group2, y_group2 = xtrain[20_000:40_000:], ytrain[20_000:40_000:]
x_group3, y_group3 = xtrain[40_000:60_000], ytrain[40_000:60_000]

# You need the accumulated lengths of the datasets: [0, 20000, 40000, 60000]
lengths = [0] + list(accumulate(map(len, [y_group1, y_group2, y_group3])))

# Now you need all three in a single dataset.
X = np.concatenate([x_group1, x_group2, x_group3], axis=0)[..., np.newaxis]
y = np.concatenate([y_group1, y_group2, y_group3], axis=0)


# KerasClassifier needs a model building function.
def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Flatten())
    model.add(Dropout(0.25))
    model.add(Dense(64))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.summary()

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop',
                  metrics=['mse', 'accuracy'])
    return model


# Creating the KerasClassifier.
keras_clf = KerasClassifier(build_fn=build_model, epochs=1, batch_size=16, verbose=0)


# Creating the custom Cross-validation splitter. Splits are based on `lengths`.
def three_fold_cv():
    i = 1
    while i <= 3:
        min_length = lengths[i - 1]
        max_length = lengths[i]
        idx = np.arange(min_length, max_length, dtype=int)
        yield idx, idx
        i += 1

accuracies = cross_val_score(estimator=keras_clf, scoring="accuracy", X=X, y=y, cv=three_fold_cv())

print(accuracies)