Python 尝试使用VAD（语音活动检测器）检测语音_Python_Pyaudio

Python 尝试使用VAD（语音活动检测器）检测语音

python

Python 尝试使用VAD（语音活动检测器）检测语音,python,pyaudio,Python,Pyaudio,我能够读取音频，但在将其传递给VAD（语音活动检测器）时收到错误消息。我认为错误消息是因为帧是以字节为单位的，当将其馈送给vad时。这个帧是以字节为单位的吗？代码如下： frame_duration_ms=10 duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms frame_size = int(sample_rate * duration_in_ms) #frame size of 160 frame_bytes = f

我能够读取音频，但在将其传递给VAD（语音活动检测器）时收到错误消息。我认为错误消息是因为帧是以字节为单位的，当将其馈送给vad时。这个帧是以字节为单位的吗？代码如下：

frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
 return frame_stored
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

speech_frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

values = []

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
        values.append(frame_stored)
 return values
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

frame\u duration\u ms=10
持续时间单位为毫秒=（帧持续时间单位为毫秒/1000）#持续时间单位为10ms
帧大小=整数（采样率*持续时间（单位：毫秒）#帧大小为160
帧字节=帧大小*2
def帧_生成器（缓冲区、帧_字节）：
#当frame_字节小于缓冲区的大小时，重复将320长度数组存储到存储的frame_
当偏移量+帧字节


以下是错误消息：
TypeError回溯（最近的调用
最后）在
16语音帧=[]
17对于帧中帧：
--->18 is_speech=vad.is_speech（帧、采样率）
19#打印（帧）
中的C:\Program Files\Python38\lib\site packages\webrtcvad.py
is_语音（自身、buf、采样率、长度）
20
21 def是语音（自身、buf、采样率、长度=无）：
--->22长度=长度或整数（长度（buf）/2）
23如果长度*2>长度（buf）：
24提升索引器(
TypeError:类型为“int”的对象没有len（）
我已经解决了它，你知道vad.is\u speech（buf=frame，sample\u rate）
，它获取buf并计算其长度，但整数值在python中不具有len（）属性。
这会引发一个错误，例如：
num = 1
print(len(num))

改用这个：
data = [1,2,3,4]
print(len(data))

下面是对代码的更正：
frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
 return frame_stored
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

speech_frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

values = []

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
        values.append(frame_stored)
 return values
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

frame\u duration\u ms=10
持续时间单位为毫秒=（帧持续时间单位为毫秒/1000）#持续时间单位为10ms
帧大小=整数（采样率*持续时间（单位：毫秒）#帧大小为160
帧字节=帧大小*2
值=[]
def帧_生成器（缓冲区、帧_字节）：
#当frame_字节小于缓冲区的大小时，重复将320长度数组存储到存储的frame_
当偏移量+帧字节