用Python实现实时信号处理-如何连续捕获音频?

用Python实现实时信号处理-如何连续捕获音频?,python,multithreading,audio,signal-processing,alsa,Python,Multithreading,Audio,Signal Processing,Alsa,我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段,处理它们,然后通过ALSA播放它们 为了开始工作,我编写了以下(非常简单)代码 问题是,音频“结巴”并且不是无间隙的。我尝试了PCM模式,将其设置为PCM_ASYNC或PCM_NONBLOCK,但问题仍然存在。我认为问题在于对“inp.read()”的两个后续调用之间的样本丢失了 有没有办法在Python中“连续”捕获音频(最好不需要太“特定”/“非标准”的库)?我希望信号总是“在后台”被捕获到某个缓冲区

我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段,处理它们,然后通过ALSA播放它们

为了开始工作,我编写了以下(非常简单)代码

问题是,音频“结巴”并且不是无间隙的。我尝试了PCM模式,将其设置为PCM_ASYNC或PCM_NONBLOCK,但问题仍然存在。我认为问题在于对“inp.read()”的两个后续调用之间的样本丢失了

有没有办法在Python中“连续”捕获音频(最好不需要太“特定”/“非标准”的库)?我希望信号总是“在后台”被捕获到某个缓冲区中,从中我可以读取一些“瞬时状态”,而音频则被进一步捕获到缓冲区中,即使在我执行读取操作的时候。我怎样才能做到这一点

即使我使用一个专用的进程/线程来捕获音频,这个进程/线程也至少必须(1)从源中读取音频,(2)然后将其放入某个缓冲区(然后“信号处理”进程/线程从中读取)。因此,这两个操作在时间上仍然是连续的,因此样本将丢失。我如何避免这种情况

非常感谢你的建议

编辑2:现在我让它运行

import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct

"""
A class implementing buffered audio I/O.
"""
class Audio:

    """
    Initialize the audio buffer.
    """
    def __init__(self):
        #self.__rate = 96000
        self.__rate = 8000
        self.__stride = 4
        self.__pre_post = 4
        self.__read_queue = Queue()
        self.__write_queue = Queue()

    """
    Reads audio from an ALSA audio device into the read queue.
    Supposed to run in its own process.
    """
    def __read(self):
        inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
        inp.setchannels(1)
        inp.setrate(self.__rate)
        inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        inp.setperiodsize(self.__rate / 50)

        while True:
            _, data = inp.read()
            self.__read_queue.put(data)

    """
    Writes audio to an ALSA audio device from the write queue.
    Supposed to run in its own process.
    """
    def __write(self):
        outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
        outp.setchannels(1)
        outp.setrate(self.__rate)
        outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        outp.setperiodsize(self.__rate / 50)

        while True:
            data = self.__write_queue.get()
            outp.write(data)

    """
    Pre-post data into the output buffer to avoid buffer underrun.
    """
    def __pre_post_data(self):
        zeros = np.zeros(self.__rate / 50, dtype = np.uint32)

        for i in range(0, self.__pre_post):
            self.__write_queue.put(zeros)

    """
    Runs the read and write processes.
    """
    def run(self):
        self.__pre_post_data()
        read_process = Process(target = self.__read)
        write_process = Process(target = self.__write)
        read_process.start()
        write_process.start()

    """
    Reads audio samples from the queue captured from the reading thread.
    """
    def read(self):
        return self.__read_queue.get()

    """
    Writes audio samples to the queue to be played by the writing thread.
    """
    def write(self, data):
        self.__write_queue.put(data)

    """
    Pseudonymize the audio samples from a binary string into an array of integers.
    """
    def pseudonymize(self, s):
        return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)

    """
    Depseudonymize the audio samples from an array of integers into a binary string.
    """
    def depseudonymize(self, a):
        s = ""

        for elem in a:
            s += struct.pack(">I", elem)

        return s

    """
    Normalize the audio samples from an array of integers into an array of floats with unity level.
    """
    def normalize(self, data, max_val):
        data = np.array(data)
        bias = int(0.5 * max_val)
        fac = 1.0 / (0.5 * max_val)
        data = fac * (data - bias)
        return data

    """
    Denormalize the data from an array of floats with unity level into an array of integers.
    """
    def denormalize(self, data, max_val):
        bias = int(0.5 * max_val)
        fac = 0.5 * max_val
        data = np.array(data)
        data = (fac * data).astype(np.int64) + bias
        return data

debug = True
audio = Audio()
audio.run()

while True:
    data = audio.read()
    pdata = audio.pseudonymize(data)

    if debug:
        print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))

    ndata = audio.normalize(pdata, 0xffffffff)

    if debug:
        print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
        print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))

    #ndata += 0.01 # When I comment in this line, it wreaks complete havoc!

    if debug:
        print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
        print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))

    pdata = audio.denormalize(ndata, 0xffffffff)

    if debug:
        print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
        print ""

    data = audio.depseudonymize(pdata)
    audio.write(data)
然而,当我甚至对音频数据进行最轻微的修改(例如,注释那一行)时,我会在输出端得到大量噪声和极端失真。似乎我没有正确处理PCM数据。奇怪的是,“液位计”等的输出似乎都有意义。但是,当我稍微偏移它时,输出完全失真(但连续)

编辑3:我刚刚发现,当我将我的算法(不包括在这里)应用于wave文件时,它们是有效的。因此,问题实际上似乎归结为ALSA API

编辑4:我终于找到了问题。他们如下

第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE,因此我假设每个样本的宽度为4字节,从而错误地解释了数据。当我请求PCM_格式_S32_LE时,它会工作

第二-ALSA输出似乎期望周期大小以字节为单位,尽管它们明确指出,在规范中,周期大小应以帧为单位。因此,如果使用32位采样深度,则必须将周期大小设置为输出的四倍

第三,即使在Python中(有一个“全局解释器锁”),进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情,因此通过更改为线程可以大大降低延迟。

当您

  • 读取一块数据
  • 写一块数据
  • 然后等待读取第二个数据块
  • 然后,如果第二个块不小于第一个块,则输出设备的缓冲区将变为空


    在开始实际处理之前,应该用静默填充输出设备的缓冲区。然后输入或输出处理中的小延迟就无关紧要了。

    您可以手动完成这一切,正如@CL在他/她的文章中建议的那样,但我建议只使用 相反:

    它是一个框架,负责执行所有“从算法中获取小块样本”;它的规模很好,你可以用Python或C++编写信号处理。 事实上,它附带了一个音频源和一个音频接收器,可以直接与ALSA通话,只需连续采样。我建议通过GNU电台阅读;它们准确地解释了为音频应用程序进行信号处理所需的内容

    真正的最小流图如下所示:

    您可以用高通滤波器代替自己的信号处理块,或使用现有块的任意组合


    有一些有用的东西,比如文件和wav文件接收器和源、过滤器、重采样器、放大器(好的,乘法器),…

    我终于找到了问题所在。他们如下

    第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE,因此我假设每个样本的宽度为4字节,从而错误地解释了数据。当我请求PCM_格式_S32_LE时,它会工作

    第二,ALSA输出似乎期望以字节为单位的周期大小,尽管它们明确表示在规范中以帧为单位。因此,如果使用32位采样深度,则必须将周期大小设置为输出的四倍

    第三,即使在Python中(有一个“全局解释器锁”),进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情,因此通过更改为线程可以大大降低延迟


    音频现在是无间隙且不失真的,但延迟太高。

    使用线程读取并发送到队列应该可以
    PCM
    有一个由
    setperiodsize
    控制的缓冲区(它似乎默认为32帧),这使您有时间发布返回的数据。我认为问题在于“read()”仅在音频设备运行时读取。如果返回,则读取操作完成(否则无法返回任何有意义的数据)。即使我运行了第二个线程,执行“read()”,然后将返回的数据附加到缓冲区,它在附加时也不会“read()”,因此capture.Wow中会有一个间隙。那么这个接口就被严重破坏了。由于您所描述的原因,具有传统阻塞/非阻塞模式的接口需要中间缓冲区。实时接口需要在生成数据之前预先放置缓冲区。但是,
    alsoaudio
    似乎不是这样工作的。我不能
    import alsaaudio
    from multiprocessing import Process, Queue
    import numpy as np
    import struct
    
    """
    A class implementing buffered audio I/O.
    """
    class Audio:
    
        """
        Initialize the audio buffer.
        """
        def __init__(self):
            #self.__rate = 96000
            self.__rate = 8000
            self.__stride = 4
            self.__pre_post = 4
            self.__read_queue = Queue()
            self.__write_queue = Queue()
    
        """
        Reads audio from an ALSA audio device into the read queue.
        Supposed to run in its own process.
        """
        def __read(self):
            inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
            inp.setchannels(1)
            inp.setrate(self.__rate)
            inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
            inp.setperiodsize(self.__rate / 50)
    
            while True:
                _, data = inp.read()
                self.__read_queue.put(data)
    
        """
        Writes audio to an ALSA audio device from the write queue.
        Supposed to run in its own process.
        """
        def __write(self):
            outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
            outp.setchannels(1)
            outp.setrate(self.__rate)
            outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
            outp.setperiodsize(self.__rate / 50)
    
            while True:
                data = self.__write_queue.get()
                outp.write(data)
    
        """
        Pre-post data into the output buffer to avoid buffer underrun.
        """
        def __pre_post_data(self):
            zeros = np.zeros(self.__rate / 50, dtype = np.uint32)
    
            for i in range(0, self.__pre_post):
                self.__write_queue.put(zeros)
    
        """
        Runs the read and write processes.
        """
        def run(self):
            self.__pre_post_data()
            read_process = Process(target = self.__read)
            write_process = Process(target = self.__write)
            read_process.start()
            write_process.start()
    
        """
        Reads audio samples from the queue captured from the reading thread.
        """
        def read(self):
            return self.__read_queue.get()
    
        """
        Writes audio samples to the queue to be played by the writing thread.
        """
        def write(self, data):
            self.__write_queue.put(data)
    
        """
        Pseudonymize the audio samples from a binary string into an array of integers.
        """
        def pseudonymize(self, s):
            return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)
    
        """
        Depseudonymize the audio samples from an array of integers into a binary string.
        """
        def depseudonymize(self, a):
            s = ""
    
            for elem in a:
                s += struct.pack(">I", elem)
    
            return s
    
        """
        Normalize the audio samples from an array of integers into an array of floats with unity level.
        """
        def normalize(self, data, max_val):
            data = np.array(data)
            bias = int(0.5 * max_val)
            fac = 1.0 / (0.5 * max_val)
            data = fac * (data - bias)
            return data
    
        """
        Denormalize the data from an array of floats with unity level into an array of integers.
        """
        def denormalize(self, data, max_val):
            bias = int(0.5 * max_val)
            fac = 0.5 * max_val
            data = np.array(data)
            data = (fac * data).astype(np.int64) + bias
            return data
    
    debug = True
    audio = Audio()
    audio.run()
    
    while True:
        data = audio.read()
        pdata = audio.pseudonymize(data)
    
        if debug:
            print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
    
        ndata = audio.normalize(pdata, 0xffffffff)
    
        if debug:
            print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
            print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
    
        #ndata += 0.01 # When I comment in this line, it wreaks complete havoc!
    
        if debug:
            print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
            print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
    
        pdata = audio.denormalize(ndata, 0xffffffff)
    
        if debug:
            print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
            print ""
    
        data = audio.depseudonymize(pdata)
        audio.write(data)