Speech recognition MFCC特征提取结果矩阵是否为负值?

Speech recognition MFCC特征提取结果矩阵是否为负值?,speech-recognition,speech,mfcc,Speech Recognition,Speech,Mfcc,我正在使用MFCC提取特征来实现一个语音识别器,我一直在使用HMM实现。我正在使用Kevin Murphy工具箱进行HMM。我的MFCC结果矩阵包含负值。我得到的可能是这种情况,我的MFCC代码可能是错误的。下面是我得到的错误- Attempted to access obsmat(:,-39.5403); index must be a positive integer or logical. Error in multinomial_prob (line 19) B(:,t) = ob

我正在使用MFCC提取特征来实现一个语音识别器,我一直在使用HMM实现。我正在使用Kevin Murphy工具箱进行HMM。我的MFCC结果矩阵包含负值。我得到的可能是这种情况,我的MFCC代码可能是错误的。下面是我得到的错误-

Attempted to access obsmat(:,-39.5403); index must be a positive integer or logical.

Error in multinomial_prob (line 19)
  B(:,t) = obsmat(:, data(t));

Error in dhmm_em>compute_ess_dhmm (line 103)
 obslik = multinomial_prob(obs, obsmat);

Error in dhmm_em (line 47)
 [loglik, exp_num_trans, exp_num_visits1, exp_num_emit] = ...

Error in speechreco (line 77)
[LL, prior2, transmat2, obsmat2] = dhmm_em(dtr{1}, prior, A, B, 'max_iter', 5);
另外,如果有人知道任何HMM的Matlab源代码的链接,请提供我的最终项目。我正在尝试实现语音识别器,不知道提取特征向量后要做什么

这是整个MatLab代码(我使用的是kevin murphy HMM工具包,错误在dhmm_em函数中):

function[]=speechreco()
vtr={8};fstr={8};nbtr={8};
ctr={8};
对于i=1:8
%从列车文件夹中读取音频数据以执行操作
st=strcat('train\s',num2str(i),'.wav');
[s1,fs1,nb1]=波形读取(st);%st是文件名;s1是采样数据,fs1是以赫兹为单位的帧速率,nb1是每个采样的位数
vtr{i}=s1;fstr{i}=fs1;nbtr{i}=nb1;
ctr{i}=mfcc(vtr{i},fstr{i});
结束
显示(ctr{1});%MFCC矩阵20*129
W1=转置(ctr{1});
ch1=菜单('Mel空格:'、'Signal 1'、'Signal 2'、'Signal 3'、,。。。
‘信号四’、‘信号五’、‘信号六’、‘信号七’、‘信号八’、‘出口’;
如果ch1~=9
地块(linspace(0,(fstr{ch1}/2),129),(melfb(20256,fstr{ch1}));
标题(“Mel-Filterbank”);
xlabel(“频率[Hz]”);
结束
%错误就在这里
[LL,prior2,transmat2,obsmat2]=dhmm_em(ctr{1},prior,A,B,'max_iter',5);
plot(LL());
结束
%%mfcc
%旧的MFCC现在
函数r=mfcc(s,fs)
m=100;
n=256;
帧=块帧(s,fs,m,n);%获得的功率谱
m=melfb(20,n,fs);
n2=1+层(n/2);
z=m*abs(帧(1:n2,:)。^2;%应用三角窗
r=dct(对数(z));%取对数,然后进行dct变换
结束
%%块帧函数
%块帧:将信号放入帧中
%
%输入:s包含要分析的信号
%fs是信号的采样率
%m是两帧开始之间的距离
%n是每帧的采样数
%
%输出:M3是包含所有帧的矩阵
功能M3=块帧(s、fs、m、n)
l=长度(s);
nbFrame=楼层((l-n)/m)+1;
对于i=1:n
对于j=1:nbFrame
M(i,j)=s(((j-1)*M)+i);%#好啊
结束
结束
h=汉明(n);
M2=diag(h)*M;
对于i=1:nbFrame
M3(:,i)=fft(M2(:,i));%#好啊
结束
结束
%--------------------------------------------------------------------------
函数m=melfb(p,n,fs)%用于功率谱图绘制
%MELFB确定mel间隔滤波器组的矩阵
% 
%输入:p过滤器组中的过滤器数量
%fft的n长度
%fs采样率(单位:Hz)
% 
%输出:包含滤波器组振幅的x(稀疏)矩阵
%尺寸(x)=[p,1+楼层(n/2)]
% 
%用法:例如,计算
%柱向量信号s,长度为n,采样率为fs:
% 
%f=fft(s);
%m=melfb(p,n,fs);
%n2=1+层(n/2);
%z=m*abs(f(1:n2))。^2;
% 
%z将包含所需mel标度光谱的p个样本
%%%%%%%%%%%%%%%%%% 
%
f0=700/fs;
fn2=楼层(n/2);
lr=log(1+0.5/f0)/(p+1);
%将DC项转换为带0的fft箱号
bl=n*(f0*(exp([01pp+1]*lr)-1));
b1=楼层(bl(1))+1;
b2=ceil(bl(2));
b3=楼层(bl(3));
b4=最小值(fn2,ceil(bl(4))-1;
pf=对数(1+(b1:b4)/n/f0)/lr;
fp=地板(pf);
pm=pf-fp;
r=[fp(b2:b4)1+fp(1:b3)];
c=[b2:b4 1:b3]+1;
v=2*[1-pm(b2:b4)pm(1:b3)];
m=稀疏(r,c,v,p,1+fn2);
结束
%----------------------------------------------------------------------

错误与MFCC中的负值无关,值可能为负值。该错误表示,索引是obsmat中的浮点值,这意味着obsmat的构造不正确,类型错误,并且值和索引位于错误的位置。您需要共享为揭示错误而编写的整个代码,而不仅仅是调用hmm训练的代码行


查看您的代码,我发现您可能需要使用ctr调用dhmm_em,而不是使用ctr{1}。

我现在添加了整个代码。如果你有,请告诉我,你可以与HMM共享用于孤立词识别的MatLab代码,以便我可以将其用作参考,并从这个离散孤立模型进一步开发用于连续模型的代码。
    function []=speechreco()

vtr = {8}; fstr = {8}; nbtr = {8};
ctr = {8};

for i = 1:8

    % Read audio data from train folder for performing operations
    st=strcat('train\s',num2str(i),'.wav');
    [s1 , fs1 , nb1]=wavread(st);  %st is filename; s1 is sample data, fs1 is frame rate in hertz, nb1 is number of bits per sample 
    vtr{i} = s1; fstr{i} = fs1; nbtr{i} = nb1;

    ctr{i} = mfcc(vtr{i},fstr{i});

end


display(ctr{1}); %MFCC matrix 20*129

W1 = transpose(ctr{1});

ch1=menu('Mel Space:','Signal 1','Signal 2','Signal 3',...
                        'Signal 4','Signal 5','Signal 6','Signal 7','Signal 8','Exit');
                    if ch1~=9
                        plot(linspace(0, (fstr{ch1}/2), 129), (melfb(20, 256, fstr{ch1})));
                        title('Mel-Spaced-Filterbank');
                        xlabel('Frequency[Hz]');
                    end


%error is here
[LL, prior2, transmat2, obsmat2] = dhmm_em(ctr{1}, prior, A, B, 'max_iter', 5);
plot(LL());

end

%%mfcc
%old one MFCC now
function r = mfcc(s, fs)
m = 100;
n = 256;
frame=blockFrames(s, fs, m, n); %power spectra obtained 
m = melfb(20, n, fs);
n2 = 1 + floor(n / 2);
z = m * abs(frame(1:n2, :)).^2; %apply traingular window
r = dct(log(z));  %take log and then the dct conversion 
end



%% blockFrames Function
% blockFrames: Puts the signal into frames
%
% Inputs: s contains the signal to analize
% fs is the sampling rate of the signal
% m is the distance between the beginnings of two frames
% n is the number of samples per frame
%
% Output: M3 is a matrix containing all the frames

function M3 = blockFrames(s, fs, m, n)
l = length(s);
nbFrame = floor((l - n) / m) + 1;
for i = 1:n
    for j = 1:nbFrame
        M(i, j) = s(((j - 1) * m) + i); %#ok<AGROW>
    end
end
h = hamming(n);
M2 = diag(h) * M;
for i = 1:nbFrame
    M3(:, i) = fft(M2(:, i)); %#ok<AGROW>
end
end
%--------------------------------------------------------------------------

function m = melfb(p, n, fs)  %used for graph plot of power spectra
% MELFB Determine matrix for a mel-spaced filterbank 
% 
% Inputs: p number of filters in filterbank 
% n length of fft 
% fs sample rate in Hz 
% 
% Outputs: x a (sparse) matrix containing the filterbank amplitudes 
% size(x) = [p, 1+floor(n/2)] 
% 
% Usage: For example, to compute the mel-scale spectrum of a 
% colum-vector signal s, with length n and sample rate fs: 
% 
% f = fft(s); 
% m = melfb(p, n, fs); 
% n2 = 1 + floor(n/2); 
% z = m * abs(f(1:n2)).^2; 
% 
% z would contain p samples of the desired mel-scale spectrum 
%%%%%%%%%%%%%%%%%% 
%
f0 = 700 / fs; 
fn2 = floor(n/2); 
lr = log(1 + 0.5/f0) / (p+1); 
% convert to fft bin numbers with 0 for DC term 
bl = n * (f0 * (exp([0 1 p p+1] * lr) - 1)); 
b1 = floor(bl(1)) + 1; 
b2 = ceil(bl(2)); 
b3 = floor(bl(3)); 
b4 = min(fn2, ceil(bl(4))) - 1; 
pf = log(1 + (b1:b4)/n/f0) / lr; 
fp = floor(pf); 
pm = pf - fp; 
r = [fp(b2:b4) 1+fp(1:b3)]; 
c = [b2:b4 1:b3] + 1; 
v = 2 * [1-pm(b2:b4) pm(1:b3)]; 
m = sparse(r, c, v, p, 1+fn2); 
end
%----------------------------------------------------------------------