Python VGG-16和ResNet的最小输入大小是多少?我可以更改它们吗?

Python VGG-16和ResNet的最小输入大小是多少?我可以更改它们吗?,python,keras,resnet,vgg-net,Python,Keras,Resnet,Vgg Net,我正在一个小项目中工作,我希望在两个网络中都安装一个999,13,1大小的元素数组,但是,添加作为输入会引发一个异常,其中一个层需要至少32x32x3的输入。 我想知道是否有可能修改VGG-16和ResNet的keras实现,以接受更小、不同的输入,假设它甚至值得修改,而不是从头开始,或者是否有我必须遵守的最小可接受输入大小 实际上,我还可以更详细地解释一下:输入文件是从几个音频文件中提取的Mel频率倒谱分量特征。999代表我提取的10秒数据,13是我提取的倒谱数,1是特定倒谱的值。 现在,据我

我正在一个小项目中工作,我希望在两个网络中都安装一个999,13,1大小的元素数组,但是,添加作为输入会引发一个异常,其中一个层需要至少32x32x3的输入。 我想知道是否有可能修改VGG-16和ResNet的keras实现,以接受更小、不同的输入,假设它甚至值得修改,而不是从头开始,或者是否有我必须遵守的最小可接受输入大小

实际上,我还可以更详细地解释一下:输入文件是从几个音频文件中提取的Mel频率倒谱分量特征。999代表我提取的10秒数据,13是我提取的倒谱数,1是特定倒谱的值。
现在,据我所知,VGG16需要RGB图像,所以我可以复制最后一个轴三次,得到999,13,3大小的图像。问题在于,由于VGG层的输入太大而无法计算,32个倒谱分量会引发大量OOM错误。将记录的时间从999降低到一个较低的数字会削弱我的模型的预测。

这是光谱图的VGG16实现,输入图像的维度应该是999,13,其中999表示时间变暗,13表示过滤器的数量

您可以根据需要更改一些中间参数

from tensorflow.keras import models
import numpy as np

import tensorflow as tf
from tensorflow.keras.layers import *




def VGG16_1d(classes = 3):
    img_input = Input((999,13))
    # Block 1
    x = layers.Conv1D(64, 3,
                      activation='relu',
                      padding='same',
                      name='block1_conv1')(img_input)
    x = layers.Conv1D(64, 3,
                      activation='relu',
                      padding='same',
                      name='block1_conv2')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block1_pool', padding='same')(x)

    # Block 2
    x = layers.Conv1D(128, 3,
                      activation='relu',
                      padding='same',
                      name='block2_conv1')(x)
    x = layers.Conv1D(128, 3,
                      activation='relu',
                      padding='same',
                      name='block2_conv2')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block2_pool', padding='same')(x)

    # Block 3
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv1')(x)
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv2')(x)
    x = layers.Conv1D(256, 3,
                      activation='relu',
                      padding='same',
                      name='block3_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block3_pool', padding='same')(x)

    # Block 4
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv1')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv2')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block4_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block4_pool', padding='same')(x)

    # Block 5
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv1')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv2')(x)
    x = layers.Conv1D(512, 3,
                      activation='relu',
                      padding='same',
                      name='block5_conv3')(x)
    x = layers.MaxPooling1D(2, strides=2, name='block5_pool', padding='same')(x)

    # Classification block
    x = layers.Flatten(name='flatten')(x)
    x = layers.Dense(128, activation='relu', name='fc1')(x) # reduced dim for 1-d task
    x = layers.Dense(128, activation='relu', name='fc2')(x)
    x = layers.Dense(classes, activation='softmax', name='predictions')(x)


    # Create model.
    model = models.Model(img_input, x, name='vgg16')
    return model

model = VGG16_1d(3)
model.summary()

999,13,1是您的图像形状吗?因此,它是999,13,3?您的数据表明,它更适合于Conv1D模型,而不是基于Conv2D的模型,如VGG或resnet。从技术上讲,可以使用Conv1D设计自定义vgg16。这会有帮助吗?检查更新的答案,也许你可以将你的标题更改为VGG16 Spectrogram或1-d数据的实现,以便以后对其他用户有用。非常感谢你,朋友,我会尽快实现它。最后,我还通过降低批量解决了这个问题,但我更喜欢你的答案!
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         [(None, 999, 13)]         0         
_________________________________________________________________
block1_conv1 (Conv1D)        (None, 999, 64)           2560      
_________________________________________________________________
block1_conv2 (Conv1D)        (None, 999, 64)           12352     
_________________________________________________________________
block1_pool (MaxPooling1D)   (None, 500, 64)           0         
_________________________________________________________________
block2_conv1 (Conv1D)        (None, 500, 128)          24704     
_________________________________________________________________
block2_conv2 (Conv1D)        (None, 500, 128)          49280     
_________________________________________________________________
block2_pool (MaxPooling1D)   (None, 250, 128)          0         
_________________________________________________________________
block3_conv1 (Conv1D)        (None, 250, 256)          98560     
_________________________________________________________________
block3_conv2 (Conv1D)        (None, 250, 256)          196864    
_________________________________________________________________
block3_conv3 (Conv1D)        (None, 250, 256)          196864    
_________________________________________________________________
block3_pool (MaxPooling1D)   (None, 125, 256)          0         
_________________________________________________________________
block4_conv1 (Conv1D)        (None, 125, 512)          393728    
_________________________________________________________________
block4_conv2 (Conv1D)        (None, 125, 512)          786944    
_________________________________________________________________
block4_conv3 (Conv1D)        (None, 125, 512)          786944    
_________________________________________________________________
block4_pool (MaxPooling1D)   (None, 63, 512)           0         
_________________________________________________________________
block5_conv1 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_conv2 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_conv3 (Conv1D)        (None, 63, 512)           786944    
_________________________________________________________________
block5_pool (MaxPooling1D)   (None, 32, 512)           0         
_________________________________________________________________
flatten (Flatten)            (None, 16384)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 128)               2097280   
_________________________________________________________________
fc2 (Dense)                  (None, 128)               16512     
_________________________________________________________________
predictions (Dense)          (None, 3)                 387       
=================================================================
Total params: 7,023,811
Trainable params: 7,023,811
Non-trainable params: 0