Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何堆叠LSTM层以对语音文件进行分类_Python_Speech Recognition_Keras_Lstm_Recurrent Neural Network - Fatal编程技术网

Python 如何堆叠LSTM层以对语音文件进行分类

Python 如何堆叠LSTM层以对语音文件进行分类,python,speech-recognition,keras,lstm,recurrent-neural-network,Python,Speech Recognition,Keras,Lstm,Recurrent Neural Network,我一直在尝试实现一个基于LSTM的分类器来对描述语音进行分类。我用13个mfcc创建了特征向量。对于给定文件,2D向量为[99,13]。在遵循mnist_irnn示例之后,我可以设置单层RNN来对我的语音文件进行分类。但现在我想给网络添加更多层。因此,我一直在尝试实现两个LSTM层和softmax层作为输出层的网络。在浏览了大量文章之后,我可以如下设置网络,在模型构建期间不会抛出任何异常 from __future__ import print_function import numpy as

我一直在尝试实现一个基于LSTM的分类器来对描述语音进行分类。我用13个mfcc创建了特征向量。对于给定文件,2D向量为[99,13]。在遵循mnist_irnn示例之后,我可以设置单层RNN来对我的语音文件进行分类。但现在我想给网络添加更多层。因此,我一直在尝试实现两个LSTM层和softmax层作为输出层的网络。在浏览了大量文章之后,我可以如下设置网络,在模型构建期间不会抛出任何异常

from __future__ import print_function
import numpy as np

from keras.optimizers import SGD
from keras.utils.visualize_util import plot

np.random.seed(1337)  # for reproducibility
from keras.preprocessing import sequence
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, TimeDistributedDense
from keras.layers.recurrent import LSTM
from SpeechResearch import loadData

batch_size = 5
hidden_units = 100
nb_classes = 10
print('Loading data...')
(X_train, y_train), (X_test, y_test) = loadData.load_mfcc(10, 2)

print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)
print('Build model...')

Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)
print(batch_size, 99, X_train.shape[2])
print(X_train.shape[1:])
print(X_train.shape[2])
model = Sequential()

model.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', activation='tanh', inner_activation='sigmoid', return_sequences=True,
               stateful=True, batch_input_shape=(batch_size, 99, X_train.shape[2])))
# model.add(Dropout(0.5))
model.add(LSTM(output_dim=hidden_units, init='uniform', inner_init='uniform',
               forget_bias_init='one', activation='tanh', inner_activation='sigmoid', return_sequences=True,
               stateful=True, input_length=X_train.shape[2]))

model.add(TimeDistributedDense(input_dim=hidden_units, output_dim=nb_classes))
model.add(Activation('softmax'))

# try using different optimizers and different optimizer configs
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

print("Train...")
model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=3, validation_data=(X_test, Y_test), show_accuracy=True)
score, acc = model.evaluate(X_test, Y_test,
                            batch_size=batch_size,
                            show_accuracy=True)
print('Test score:', score)
print('Test accuracy:', acc)
我在不同的地方尝试了不同的价值观。(目前我一直在尝试使用一个小样本,因此值非常小)但是,现在它在训练期间抛出异常。某些维度不匹配

Using Theano backend.
Loading data...
100 train sequences
20 test sequences
X_train shape: (100, 99, 13)
X_test shape: (20, 99, 13)
y_train shape: (100,)
y_test shape: (20,)
Build model...
5 99 13
(99, 13)
13
Train...
Train on 100 samples, validate on 20 samples
Epoch 1/3

Traceback (most recent call last):
  File "/home/udani/PycharmProjects/testResearch/SpeechResearch/lstmNetwork.py", line 54, in <module>
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=3, validation_data=(X_test, Y_test), show_accuracy=True)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 581, in fit
    shuffle=shuffle, metrics=metrics)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 239, in _fit
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 365, in __call__
    return self.function(*inputs)
  File "/home/udani/Documents/ResearchSW/Theano/theano/compile/function_module.py", line 786, in __call__
    allow_downcast=s.allow_downcast)
  File "/home/udani/Documents/ResearchSW/Theano/theano/tensor/type.py", line 177, in filter
    data.shape))
TypeError: ('Bad input argument to theano function with name "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:362"  at index 1(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (5, 10).')
使用无后端。
正在加载数据。。。
100列车序列
20个测试序列
X_列车形状:(100、99、13)
X_测试形状:(20,99,13)
y_列车形状:(100,)
y_试验形状:(20,)
建立模型。。。
5 99 13
(99, 13)
13
火车。。。
培训100个样本,验证20个样本
纪元1/3
回溯(最近一次呼叫最后一次):
文件“/home/udani/PycharmProjects/testResearch/SpeechResearch/lstmNetwork.py”,第54行,在
模型拟合(X\u序列,Y\u序列,批次大小=批次大小,nb\u历元=3,验证数据=(X\u测试,Y\u测试),显示精度=真)
文件“/usr/local/lib/python2.7/dist-packages/keras/models.py”,第581行,适合
洗牌=洗牌,指标=指标)
文件“/usr/local/lib/python2.7/dist-packages/keras/models.py”,第239行,以
outs=f(ins\U批量)
文件“/usr/local/lib/python2.7/dist packages/keras/backend/theano_backend.py”,第365行,在调用中__
返回自我功能(*输入)
文件“/home/udani/Documents/ResearchSW/Theano/Theano/compile/function\u module.py”,第786行,在调用中__
allow_downcast=s.allow_downcast)
文件“/home/udani/Documents/ResearchSW/Theano/Theano/tensor/type.py”,过滤器第177行
数据。形状)
TypeError:(“名为“/usr/local/lib/python2.7/dist packages/keras/backend/theano_backend.py:362”的theano函数的输入参数不正确,在索引1处(基于0),“维数错误:应为3,形状为(5,10)的维数为2”。)
我想知道我在这里做错了什么。我一整天都在看代码,但仍然无法找出维度不匹配的原因

Using Theano backend.
Loading data...
100 train sequences
20 test sequences
X_train shape: (100, 99, 13)
X_test shape: (20, 99, 13)
y_train shape: (100,)
y_test shape: (20,)
Build model...
5 99 13
(99, 13)
13
Train...
Train on 100 samples, validate on 20 samples
Epoch 1/3

Traceback (most recent call last):
  File "/home/udani/PycharmProjects/testResearch/SpeechResearch/lstmNetwork.py", line 54, in <module>
    model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=3, validation_data=(X_test, Y_test), show_accuracy=True)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 581, in fit
    shuffle=shuffle, metrics=metrics)
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 239, in _fit
    outs = f(ins_batch)
  File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 365, in __call__
    return self.function(*inputs)
  File "/home/udani/Documents/ResearchSW/Theano/theano/compile/function_module.py", line 786, in __call__
    allow_downcast=s.allow_downcast)
  File "/home/udani/Documents/ResearchSW/Theano/theano/tensor/type.py", line 177, in filter
    data.shape))
TypeError: ('Bad input argument to theano function with name "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py:362"  at index 1(0-based)', 'Wrong number of dimensions: expected 3, got 2 with shape (5, 10).')

此外,如果有人能解释输出的含义,我将非常感激。(当给定层中有n个节点时,这是单个节点输出的向量的形状吗?它应该等于下一层中的节点数吗?

如果
Y
维度有问题,输出应该类似于
(100,99,10)
,这是一组输出序列,与特征相同,产量只有1。您的
Y
向量似乎不同。方法
to_category
实际上不适用于序列,它需要一个向量

或者,您可以使用
return\u sequences=False


您也不需要有状态的网络。

Udani,为什么要使用不同的帐户?已经在@NikolayShmyrev上讨论过了。我收到一条关于禁止提问的警告。我担心他们会禁止我进入堆栈溢出。因此,我请我的一个朋友帮我把它寄出去。我没有其他人可以问这些问题。我试图按照您所说的设置Y集,但我无法确定如何设置网络以匹配此处提到的帧标签。因此,我转向keras中mnist示例的多对一映射方法。您的意思是,我必须为每个帧使用标签,而不是为整个文件使用一个标签(如mnist示例中的多对一映射)。每帧标签应采用分类格式(如9为0 0 01)?此外,我不必像上面链接中提到的那样包含伪值?这是否意味着当我们采样(值9)时,所有标签(99帧)都应该如上所示(所有99帧的分类标签相同)?