用Python实现实时信号处理-如何连续捕获音频？_Python_Multithreading_Audio_Signal Processing_Alsa

用Python实现实时信号处理-如何连续捕获音频？

python multithreading audio

用Python实现实时信号处理-如何连续捕获音频？,python,multithreading,audio,signal-processing,alsa,Python,Multithreading,Audio,Signal Processing,Alsa,我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段，处理它们，然后通过ALSA播放它们为了开始工作，我编写了以下（非常简单）代码问题是，音频“结巴”并且不是无间隙的。我尝试了PCM模式，将其设置为PCM_ASYNC或PCM_NONBLOCK，但问题仍然存在。我认为问题在于对“inp.read（）”的两个后续调用之间的样本丢失了有没有办法在Python中“连续”捕获音频（最好不需要太“特定”/“非标准”的库）？我希望信号总是“在后台”被捕获到某个缓冲区

我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段，处理它们，然后通过ALSA播放它们

为了开始工作，我编写了以下（非常简单）代码

问题是，音频“结巴”并且不是无间隙的。我尝试了PCM模式，将其设置为PCM_ASYNC或PCM_NONBLOCK，但问题仍然存在。我认为问题在于对“inp.read（）”的两个后续调用之间的样本丢失了

有没有办法在Python中“连续”捕获音频（最好不需要太“特定”/“非标准”的库）？我希望信号总是“在后台”被捕获到某个缓冲区中，从中我可以读取一些“瞬时状态”，而音频则被进一步捕获到缓冲区中，即使在我执行读取操作的时候。我怎样才能做到这一点

即使我使用一个专用的进程/线程来捕获音频，这个进程/线程也至少必须（1）从源中读取音频，（2）然后将其放入某个缓冲区（然后“信号处理”进程/线程从中读取）。因此，这两个操作在时间上仍然是连续的，因此样本将丢失。我如何避免这种情况

非常感谢你的建议

编辑2:现在我让它运行

import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct

"""
A class implementing buffered audio I/O.
"""
class Audio:

    """
    Initialize the audio buffer.
    """
    def __init__(self):
        #self.__rate = 96000
        self.__rate = 8000
        self.__stride = 4
        self.__pre_post = 4
        self.__read_queue = Queue()
        self.__write_queue = Queue()

    """
    Reads audio from an ALSA audio device into the read queue.
    Supposed to run in its own process.
    """
    def __read(self):
        inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
        inp.setchannels(1)
        inp.setrate(self.__rate)
        inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        inp.setperiodsize(self.__rate / 50)

        while True:
            _, data = inp.read()
            self.__read_queue.put(data)

    """
    Writes audio to an ALSA audio device from the write queue.
    Supposed to run in its own process.
    """
    def __write(self):
        outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
        outp.setchannels(1)
        outp.setrate(self.__rate)
        outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
        outp.setperiodsize(self.__rate / 50)

        while True:
            data = self.__write_queue.get()
            outp.write(data)

    """
    Pre-post data into the output buffer to avoid buffer underrun.
    """
    def __pre_post_data(self):
        zeros = np.zeros(self.__rate / 50, dtype = np.uint32)

        for i in range(0, self.__pre_post):
            self.__write_queue.put(zeros)

    """
    Runs the read and write processes.
    """
    def run(self):
        self.__pre_post_data()
        read_process = Process(target = self.__read)
        write_process = Process(target = self.__write)
        read_process.start()
        write_process.start()

    """
    Reads audio samples from the queue captured from the reading thread.
    """
    def read(self):
        return self.__read_queue.get()

    """
    Writes audio samples to the queue to be played by the writing thread.
    """
    def write(self, data):
        self.__write_queue.put(data)

    """
    Pseudonymize the audio samples from a binary string into an array of integers.
    """
    def pseudonymize(self, s):
        return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)

    """
    Depseudonymize the audio samples from an array of integers into a binary string.
    """
    def depseudonymize(self, a):
        s = ""

        for elem in a:
            s += struct.pack(">I", elem)

        return s

    """
    Normalize the audio samples from an array of integers into an array of floats with unity level.
    """
    def normalize(self, data, max_val):
        data = np.array(data)
        bias = int(0.5 * max_val)
        fac = 1.0 / (0.5 * max_val)
        data = fac * (data - bias)
        return data

    """
    Denormalize the data from an array of floats with unity level into an array of integers.
    """
    def denormalize(self, data, max_val):
        bias = int(0.5 * max_val)
        fac = 0.5 * max_val
        data = np.array(data)
        data = (fac * data).astype(np.int64) + bias
        return data

debug = True
audio = Audio()
audio.run()

while True:
    data = audio.read()
    pdata = audio.pseudonymize(data)

    if debug:
        print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))

    ndata = audio.normalize(pdata, 0xffffffff)

    if debug:
        print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
        print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))

    #ndata += 0.01 # When I comment in this line, it wreaks complete havoc!

    if debug:
        print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
        print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))

    pdata = audio.denormalize(ndata, 0xffffffff)

    if debug:
        print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
        print ""

    data = audio.depseudonymize(pdata)
    audio.write(data)

然而，当我甚至对音频数据进行最轻微的修改（例如，注释那一行）时，我会在输出端得到大量噪声和极端失真。似乎我没有正确处理PCM数据。奇怪的是，“液位计”等的输出似乎都有意义。但是，当我稍微偏移它时，输出完全失真（但连续）

编辑3：我刚刚发现，当我将我的算法（不包括在这里）应用于wave文件时，它们是有效的。因此，问题实际上似乎归结为ALSA API

编辑4：我终于找到了问题。他们如下

第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE，因此我假设每个样本的宽度为4字节，从而错误地解释了数据。当我请求PCM_格式_S32_LE时，它会工作

第二-ALSA输出似乎期望周期大小以字节为单位，尽管它们明确指出，在规范中，周期大小应以帧为单位。因此，如果使用32位采样深度，则必须将周期大小设置为输出的四倍
第三，即使在Python中（有一个“全局解释器锁”），进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情，因此通过更改为线程可以大大降低延迟。
当您

读取一块数据

写一块数据

然后等待读取第二个数据块
然后，如果第二个块不小于第一个块，则输出设备的缓冲区将变为空

在开始实际处理之前，应该用静默填充输出设备的缓冲区。然后输入或输出处理中的小延迟就无关紧要了。
您可以手动完成这一切，正如@CL在他/她的文章中建议的那样，但我建议只使用相反：
它是一个框架，负责执行所有“从算法中获取小块样本”；它的规模很好，你可以用Python或C++编写信号处理。事实上，它附带了一个音频源和一个音频接收器，可以直接与ALSA通话，只需连续采样。我建议通过GNU电台阅读；它们准确地解释了为音频应用程序进行信号处理所需的内容
真正的最小流图如下所示：

您可以用高通滤波器代替自己的信号处理块，或使用现有块的任意组合

有一些有用的东西，比如文件和wav文件接收器和源、过滤器、重采样器、放大器（好的，乘法器），…
我终于找到了问题所在。他们如下
第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE，因此我假设每个样本的宽度为4字节，从而错误地解释了数据。当我请求PCM_格式_S32_LE时，它会工作
第二，ALSA输出似乎期望以字节为单位的周期大小，尽管它们明确表示在规范中以帧为单位。因此，如果使用32位采样深度，则必须将周期大小设置为输出的四倍
第三，即使在Python中（有一个“全局解释器锁”），进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情，因此通过更改为线程可以大大降低延迟

音频现在是无间隙且不失真的，但延迟太高。
使用线程读取并发送到队列应该可以
PCM
有一个由
setperiodsize
控制的缓冲区（它似乎默认为32帧），这使您有时间发布返回的数据。我认为问题在于“read（）”仅在音频设备运行时读取。如果返回，则读取操作完成（否则无法返回任何有意义的数据）。即使我运行了第二个线程，执行“read（）”，然后将返回的数据附加到缓冲区，它在附加时也不会“read（）”，因此capture.Wow中会有一个间隙。那么这个接口就被严重破坏了。由于您所描述的原因，具有传统阻塞/非阻塞模式的接口需要中间缓冲区。实时接口需要在生成数据之前预先放置缓冲区。但是，
alsoaudio
似乎不是这样工作的。我不能
import alsaaudio from multiprocessing import Process, Queue import numpy as np import struct """ A class implementing buffered audio I/O. """ class Audio: """ Initialize the audio buffer. """ def __init__(self): #self.__rate = 96000 self.__rate = 8000 self.__stride = 4 self.__pre_post = 4 self.__read_queue = Queue() self.__write_queue = Queue() """ Reads audio from an ALSA audio device into the read queue. Supposed to run in its own process. """ def __read(self): inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL) inp.setchannels(1) inp.setrate(self.__rate) inp.setformat(alsaaudio.PCM_FORMAT_U32_BE) inp.setperiodsize(self.__rate / 50) while True: _, data = inp.read() self.__read_queue.put(data) """ Writes audio to an ALSA audio device from the write queue. Supposed to run in its own process. """ def __write(self): outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL) outp.setchannels(1) outp.setrate(self.__rate) outp.setformat(alsaaudio.PCM_FORMAT_U32_BE) outp.setperiodsize(self.__rate / 50) while True: data = self.__write_queue.get() outp.write(data) """ Pre-post data into the output buffer to avoid buffer underrun. """ def __pre_post_data(self): zeros = np.zeros(self.__rate / 50, dtype = np.uint32) for i in range(0, self.__pre_post): self.__write_queue.put(zeros) """ Runs the read and write processes. """ def run(self): self.__pre_post_data() read_process = Process(target = self.__read) write_process = Process(target = self.__write) read_process.start() write_process.start() """ Reads audio samples from the queue captured from the reading thread. """ def read(self): return self.__read_queue.get() """ Writes audio samples to the queue to be played by the writing thread. """ def write(self, data): self.__write_queue.put(data) """ Pseudonymize the audio samples from a binary string into an array of integers. """ def pseudonymize(self, s): return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s) """ Depseudonymize the audio samples from an array of integers into a binary string. """ def depseudonymize(self, a): s = "" for elem in a: s += struct.pack(">I", elem) return s """ Normalize the audio samples from an array of integers into an array of floats with unity level. """ def normalize(self, data, max_val): data = np.array(data) bias = int(0.5 * max_val) fac = 1.0 / (0.5 * max_val) data = fac * (data - bias) return data """ Denormalize the data from an array of floats with unity level into an array of integers. """ def denormalize(self, data, max_val): bias = int(0.5 * max_val) fac = 0.5 * max_val data = np.array(data) data = (fac * data).astype(np.int64) + bias return data debug = True audio = Audio() audio.run() while True: data = audio.read() pdata = audio.pseudonymize(data) if debug: print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata)) ndata = audio.normalize(pdata, 0xffffffff) if debug: print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata)) print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata))))) #ndata += 0.01 # When I comment in this line, it wreaks complete havoc! if debug: print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata))))) print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata)) pdata = audio.denormalize(ndata, 0xffffffff) if debug: print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata)) print "" data = audio.depseudonymize(pdata) audio.write(data)