Python pyaudio-“的;“听”;直到检测到语音,然后记录到.wav文件

Python pyaudio-“的;“听”;直到检测到语音,然后记录到.wav文件,python,multithreading,audio,pyaudio,Python,Multithreading,Audio,Pyaudio,我遇到了一些问题,我似乎无法理解这个概念 我想做的是: 让麦克风“聆听”声音(高于特定阈值),然后开始录制到.wav文件,直到此人停止讲话/信号不再存在。例如: begin: listen() -> nothing is being said listen() -> nothing is being said listen() -> VOICED - _BEGIN RECORDING_ listen() -> VOICED - _BEGIN REC

我遇到了一些问题,我似乎无法理解这个概念

我想做的是:

让麦克风“聆听”声音(高于特定阈值),然后开始录制到.wav文件,直到此人停止讲话/信号不再存在。例如:

begin:
   listen() -> nothing is being said
   listen() -> nothing is being said
   listen() -> VOICED - _BEGIN RECORDING_
   listen() -> VOICED - _BEGIN RECORDING_
   listen() -> UNVOICED - _END RECORDING_
end
我还想使用“线程”来实现这一点,这样就可以创建一个线程,不断地“侦听”文件,当有语音数据时,另一个线程就会开始。。但是,我一辈子都不知道该怎么做。。以下是我目前的代码:

import wave
import sys
import threading
from array import array
from sys import byteorder

try:
    import pyaudio
    CHECK_PYLIB = True
except ImportError:
    CHECK_PYLIB = False

class Audio:
    _chunk = 0.0
    _format = 0.0
    _channels = 0.0
    _rate = 0.0
    record_for = 0.0
    stream = None

    p = None

    sample_width = None
    THRESHOLD = 500

    # initial constructor to accept params
    def __init__(self, chunk, format, channels, rate):
        #### set data-types

        self._chunk = chunk
        self.format = pyaudio.paInt16,
        self.channels = channels
        self.rate = rate

        self.p = pyaudio.PyAudio();

   def open(self):
       # print "opened"
       self.stream = self.p.open(format=pyaudio.paInt16,
                                 channels=2,
                                 rate=44100,
                                 input=True,
                                 frames_per_buffer=1024);
       return True

   def record(self):
       # create a new instance/thread to record the sound
       threading.Thread(target=self.listen).start();

   def is_silence(snd_data):
       return max(snd_data) < THRESHOLD

   def listen(self):
       r = array('h')

       while True:
           snd_data = array('h', self.stream.read(self._chunk))

           if byteorder == 'big':
               snd_data.byteswap()
           r.extend(snd_data)

       return sample_width, r

现在,每5秒后,我需要“process”函数执行一次,然后处理数据(time.delay(10)),同时执行此操作,然后开始备份录制。

花了一些时间,我想出了以下代码,除了写入文件之外,似乎正在做您需要的事情:

import threading
from array import array
from Queue import Queue, Full

import pyaudio


CHUNK_SIZE = 1024
MIN_VOLUME = 500
# if the recording thread can't consume fast enough, the listener will start discarding
BUF_MAX_SIZE = CHUNK_SIZE * 10


def main():
    stopped = threading.Event()
    q = Queue(maxsize=int(round(BUF_MAX_SIZE / CHUNK_SIZE)))

    listen_t = threading.Thread(target=listen, args=(stopped, q))
    listen_t.start()
    record_t = threading.Thread(target=record, args=(stopped, q))
    record_t.start()

    try:
        while True:
            listen_t.join(0.1)
            record_t.join(0.1)
    except KeyboardInterrupt:
        stopped.set()

    listen_t.join()
    record_t.join()


def record(stopped, q):
    while True:
        if stopped.wait(timeout=0):
            break
        chunk = q.get()
        vol = max(chunk)
        if vol >= MIN_VOLUME:
            # TODO: write to file
            print "O",
        else:
            print "-",


def listen(stopped, q):
    stream = pyaudio.PyAudio().open(
        format=pyaudio.paInt16,
        channels=2,
        rate=44100,
        input=True,
        frames_per_buffer=1024,
    )

    while True:
        if stopped.wait(timeout=0):
            break
        try:
            q.put(array('h', stream.read(CHUNK_SIZE)))
        except Full:
            pass  # discard


if __name__ == '__main__':
    main()
看这里:


它甚至将Wav转换为flac并发送到google语音api,如果不需要,只需删除stt_google_Wav函数;)

阅读强烈推荐的:)问题:您是否实际实例化过多个
音频
对象?我问这个问题是因为我不太明白为什么你要把代码放在一个类中——我得到的是一个面向初学者的Java,他们总是要求一切都是面向对象的,只是为了它。@ErikAllik我必须承认,我是Python新手:(这很明显;这就是为什么我请你参考PEP8的原因。@ErikAllik我来看看:)但是,就这个问题而言。。有什么想法吗?谢谢你的回复。我已经使用了你给我的代码,但是,我的环境已经改变,我已经尝试实现它。(请参阅上面的更新^^)我似乎不知道如何在每10秒后执行“process()”,完成处理,然后再次开始录制。。有什么建议吗?谢谢你!首先,我已经回答了你原来的问题;其次,我真的不明白你改变的情况是什么,因为你没有很清楚地描述它们。甚至你粘贴的代码也被破坏了,因为你没有花时间修复缩进/格式。事实上,我不知道你对我的原始代码段做了什么-你把它完全弄糟了。我认为你应该读一本初级Python/编程书或其他什么。这个答案太棒了。我刚刚学会了Python中的多线程。@JakeStewart很高兴自己有用!这是正确的想法,但请注意,它使用的检测方法相当原始。它只是检查麦克风强度,并假设有人在说话,如果有足够的噪音。实际上,语音是在一定的频率范围内工作的,因此,在任何环境中,即使噪声很小,代码也会有大量误报。一个更好的解决方案是使用FFT,并开始记录,如果信号被检测到,并测量声音的频谱平坦度,以了解它是浊音还是噪声
import threading
from array import array
from Queue import Queue, Full

import pyaudio


CHUNK_SIZE = 1024
MIN_VOLUME = 500
# if the recording thread can't consume fast enough, the listener will start discarding
BUF_MAX_SIZE = CHUNK_SIZE * 10


def main():
    stopped = threading.Event()
    q = Queue(maxsize=int(round(BUF_MAX_SIZE / CHUNK_SIZE)))

    listen_t = threading.Thread(target=listen, args=(stopped, q))
    listen_t.start()
    record_t = threading.Thread(target=record, args=(stopped, q))
    record_t.start()

    try:
        while True:
            listen_t.join(0.1)
            record_t.join(0.1)
    except KeyboardInterrupt:
        stopped.set()

    listen_t.join()
    record_t.join()


def record(stopped, q):
    while True:
        if stopped.wait(timeout=0):
            break
        chunk = q.get()
        vol = max(chunk)
        if vol >= MIN_VOLUME:
            # TODO: write to file
            print "O",
        else:
            print "-",


def listen(stopped, q):
    stream = pyaudio.PyAudio().open(
        format=pyaudio.paInt16,
        channels=2,
        rate=44100,
        input=True,
        frames_per_buffer=1024,
    )

    while True:
        if stopped.wait(timeout=0):
            break
        try:
            q.put(array('h', stream.read(CHUNK_SIZE)))
        except Full:
            pass  # discard


if __name__ == '__main__':
    main()