从python_语音_功能中使用mfcc并获取内存错误
我正在使用python_speech_特性中的mfcc,并尝试从(5-120)秒范围内的wave文件中提取特性。对于持续时间较短的文件(如(10,20)秒),我可以提取功能,但对于较大的文件,它会显示此错误:从python_语音_功能中使用mfcc并获取内存错误,python,machine-learning,memory,feature-extraction,mfcc,Python,Machine Learning,Memory,Feature Extraction,Mfcc,我正在使用python_speech_特性中的mfcc,并尝试从(5-120)秒范围内的wave文件中提取特性。对于持续时间较短的文件(如(10,20)秒),我可以提取功能,但对于较大的文件,它会显示此错误: --------------------------------------------------------------------------- MemoryError Traceback (most recent call
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-6-ea3546938d03> in <module>
14 print("\n\tFeatures\n")
15 data, sampling_rate = librosa.load(sample_data[i])
---> 16 mfcc_features = mfcc(data,sampling_rate,winlen=30,nfft=66150)
17 print(pd.DataFrame(mfcc_features))
18 print("========================================\n")
~/anaconda3/lib/python3.8/site-packages/python_speech_features/base.py in mfcc(signal, samplerate, winlen, winstep, numcep, nfilt, nfft, lowfreq, highfreq, preemph, ceplifter, appendEnergy, winfunc)
26 :returns: A numpy array of size (NUMFRAMES by numcep) containing features. Each row holds 1 feature vector.
27 """
---> 28 feat,energy = fbank(signal,samplerate,winlen,winstep,nfilt,nfft,lowfreq,highfreq,preemph,winfunc)
29 feat = numpy.log(feat)
30 feat = dct(feat, type=2, axis=1, norm='ortho')[:,:numcep]
~/anaconda3/lib/python3.8/site-packages/python_speech_features/base.py in fbank(signal, samplerate, winlen, winstep, nfilt, nfft, lowfreq, highfreq, preemph, winfunc)
53 highfreq= highfreq or samplerate/2
54 signal = sigproc.preemphasis(signal,preemph)
---> 55 frames = sigproc.framesig(signal, winlen*samplerate, winstep*samplerate, winfunc)
56 pspec = sigproc.powspec(frames,nfft)
57 energy = numpy.sum(pspec,1) # this stores the total energy in each frame
~/anaconda3/lib/python3.8/site-packages/python_speech_features/sigproc.py in framesig(sig, frame_len, frame_step, winfunc)
33 padsignal = numpy.concatenate((sig,zeros))
34
---> 35 indices = numpy.tile(numpy.arange(0,frame_len),(numframes,1)) + numpy.tile(numpy.arange(0,numframes*frame_step,frame_step),(frame_len,1)).T
36 indices = numpy.array(indices,dtype=numpy.int32)
37 frames = padsignal[indices]
<__array_function__ internals> in tile(*args, **kwargs)
~/anaconda3/lib/python3.8/site-packages/numpy/lib/shape_base.py in tile(A, reps)
1256 for dim_in, nrep in zip(c.shape, tup):
1257 if nrep != 1:
-> 1258 c = c.reshape(-1, n).repeat(nrep, 0)
1259 n //= dim_in
1260 return c.reshape(shape_out)
MemoryError: Unable to allocate 12.8 GiB for an array with shape (2591, 661500) and data type int64
print("\nSample Data:")
print("============\n")
path = ('speech-sample-data')
sample_data = [os.path.join(dp, f) for dp, dn, filenames in os.walk(path) for f in filenames if os.path.splitext(f)[1] == '.wav']
for i in range(5):
print("Speech: ")
ipd.display(ipd.Audio(sample_data[i]))
print("Type: \n\tNormal\n")
print("\n\tFeatures\n")
data, sampling_rate = librosa.load(sample_data[i])
mfcc_features = mfcc(data,sampling_rate,winlen=30,nfft=66150)
print(pd.DataFrame(mfcc_features))
print("========================================\n")
print("Speech: ")
ipd.display(ipd.Audio(sample_data[i+5]))
print("Type: \n\tToxic\n")
print("\n\tFeatures\n")
data, sampling_rate = librosa.load(sample_data[i+5])
mfcc_features = mfcc(data,sampling_rate,winlen=30,nfft=66150)
print(pd.DataFrame(mfcc_features))
print("========================================\n")