用Python实现实时信号处理-如何连续捕获音频?
我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段,处理它们,然后通过ALSA播放它们 为了开始工作,我编写了以下(非常简单)代码 问题是,音频“结巴”并且不是无间隙的。我尝试了PCM模式,将其设置为PCM_ASYNC或PCM_NONBLOCK,但问题仍然存在。我认为问题在于对“inp.read()”的两个后续调用之间的样本丢失了 有没有办法在Python中“连续”捕获音频(最好不需要太“特定”/“非标准”的库)?我希望信号总是“在后台”被捕获到某个缓冲区中,从中我可以读取一些“瞬时状态”,而音频则被进一步捕获到缓冲区中,即使在我执行读取操作的时候。我怎样才能做到这一点 即使我使用一个专用的进程/线程来捕获音频,这个进程/线程也至少必须(1)从源中读取音频,(2)然后将其放入某个缓冲区(然后“信号处理”进程/线程从中读取)。因此,这两个操作在时间上仍然是连续的,因此样本将丢失。我如何避免这种情况 非常感谢你的建议 编辑2:现在我让它运行用Python实现实时信号处理-如何连续捕获音频?,python,multithreading,audio,signal-processing,alsa,Python,Multithreading,Audio,Signal Processing,Alsa,我计划用Python实现一个“类似DSP”的信号处理器。它应该通过ALSA捕获音频的小片段,处理它们,然后通过ALSA播放它们 为了开始工作,我编写了以下(非常简单)代码 问题是,音频“结巴”并且不是无间隙的。我尝试了PCM模式,将其设置为PCM_ASYNC或PCM_NONBLOCK,但问题仍然存在。我认为问题在于对“inp.read()”的两个后续调用之间的样本丢失了 有没有办法在Python中“连续”捕获音频(最好不需要太“特定”/“非标准”的库)?我希望信号总是“在后台”被捕获到某个缓冲区
import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct
"""
A class implementing buffered audio I/O.
"""
class Audio:
"""
Initialize the audio buffer.
"""
def __init__(self):
#self.__rate = 96000
self.__rate = 8000
self.__stride = 4
self.__pre_post = 4
self.__read_queue = Queue()
self.__write_queue = Queue()
"""
Reads audio from an ALSA audio device into the read queue.
Supposed to run in its own process.
"""
def __read(self):
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(self.__rate)
inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
inp.setperiodsize(self.__rate / 50)
while True:
_, data = inp.read()
self.__read_queue.put(data)
"""
Writes audio to an ALSA audio device from the write queue.
Supposed to run in its own process.
"""
def __write(self):
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(self.__rate)
outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
outp.setperiodsize(self.__rate / 50)
while True:
data = self.__write_queue.get()
outp.write(data)
"""
Pre-post data into the output buffer to avoid buffer underrun.
"""
def __pre_post_data(self):
zeros = np.zeros(self.__rate / 50, dtype = np.uint32)
for i in range(0, self.__pre_post):
self.__write_queue.put(zeros)
"""
Runs the read and write processes.
"""
def run(self):
self.__pre_post_data()
read_process = Process(target = self.__read)
write_process = Process(target = self.__write)
read_process.start()
write_process.start()
"""
Reads audio samples from the queue captured from the reading thread.
"""
def read(self):
return self.__read_queue.get()
"""
Writes audio samples to the queue to be played by the writing thread.
"""
def write(self, data):
self.__write_queue.put(data)
"""
Pseudonymize the audio samples from a binary string into an array of integers.
"""
def pseudonymize(self, s):
return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)
"""
Depseudonymize the audio samples from an array of integers into a binary string.
"""
def depseudonymize(self, a):
s = ""
for elem in a:
s += struct.pack(">I", elem)
return s
"""
Normalize the audio samples from an array of integers into an array of floats with unity level.
"""
def normalize(self, data, max_val):
data = np.array(data)
bias = int(0.5 * max_val)
fac = 1.0 / (0.5 * max_val)
data = fac * (data - bias)
return data
"""
Denormalize the data from an array of floats with unity level into an array of integers.
"""
def denormalize(self, data, max_val):
bias = int(0.5 * max_val)
fac = 0.5 * max_val
data = np.array(data)
data = (fac * data).astype(np.int64) + bias
return data
debug = True
audio = Audio()
audio.run()
while True:
data = audio.read()
pdata = audio.pseudonymize(data)
if debug:
print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
ndata = audio.normalize(pdata, 0xffffffff)
if debug:
print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
#ndata += 0.01 # When I comment in this line, it wreaks complete havoc!
if debug:
print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
pdata = audio.denormalize(ndata, 0xffffffff)
if debug:
print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
print ""
data = audio.depseudonymize(pdata)
audio.write(data)
然而,当我甚至对音频数据进行最轻微的修改(例如,注释那一行)时,我会在输出端得到大量噪声和极端失真。似乎我没有正确处理PCM数据。奇怪的是,“液位计”等的输出似乎都有意义。但是,当我稍微偏移它时,输出完全失真(但连续)
编辑3:我刚刚发现,当我将我的算法(不包括在这里)应用于wave文件时,它们是有效的。因此,问题实际上似乎归结为ALSA API
编辑4:我终于找到了问题。他们如下
第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE,因此我假设每个样本的宽度为4字节,从而错误地解释了数据。当我请求PCM_格式_S32_LE时,它会工作
第二-ALSA输出似乎期望周期大小以字节为单位,尽管它们明确指出,在规范中,周期大小应以帧为单位。因此,如果使用32位采样深度,则必须将周期大小设置为输出的四倍
第三,即使在Python中(有一个“全局解释器锁”),进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情,因此通过更改为线程可以大大降低延迟。当您
在开始实际处理之前,应该用静默填充输出设备的缓冲区。然后输入或输出处理中的小延迟就无关紧要了。您可以手动完成这一切,正如@CL在他/她的文章中建议的那样,但我建议只使用 相反: 它是一个框架,负责执行所有“从算法中获取小块样本”;它的规模很好,你可以用Python或C++编写信号处理。 事实上,它附带了一个音频源和一个音频接收器,可以直接与ALSA通话,只需连续采样。我建议通过GNU电台阅读;它们准确地解释了为音频应用程序进行信号处理所需的内容 真正的最小流图如下所示: 您可以用高通滤波器代替自己的信号处理块,或使用现有块的任意组合
有一些有用的东西,比如文件和wav文件接收器和源、过滤器、重采样器、放大器(好的,乘法器),…我终于找到了问题所在。他们如下 第一-ALSA在请求PCM_格式_U32_LE时悄悄地“退回”到PCM_格式_U8_LE,因此我假设每个样本的宽度为4字节,从而错误地解释了数据。当我请求PCM_格式_S32_LE时,它会工作 第二,ALSA输出似乎期望以字节为单位的周期大小,尽管它们明确表示在规范中以帧为单位。因此,如果使用32位采样深度,则必须将周期大小设置为输出的四倍 第三,即使在Python中(有一个“全局解释器锁”),进程也比线程慢。由于I/O线程基本上不做任何计算密集型的事情,因此通过更改为线程可以大大降低延迟
音频现在是无间隙且不失真的,但延迟太高。使用线程读取并发送到队列应该可以
PCM
有一个由setperiodsize
控制的缓冲区(它似乎默认为32帧),这使您有时间发布返回的数据。我认为问题在于“read()”仅在音频设备运行时读取。如果返回,则读取操作完成(否则无法返回任何有意义的数据)。即使我运行了第二个线程,执行“read()”,然后将返回的数据附加到缓冲区,它在附加时也不会“read()”,因此capture.Wow中会有一个间隙。那么这个接口就被严重破坏了。由于您所描述的原因,具有传统阻塞/非阻塞模式的接口需要中间缓冲区。实时接口需要在生成数据之前预先放置缓冲区。但是,alsoaudio
似乎不是这样工作的。我不能
import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct
"""
A class implementing buffered audio I/O.
"""
class Audio:
"""
Initialize the audio buffer.
"""
def __init__(self):
#self.__rate = 96000
self.__rate = 8000
self.__stride = 4
self.__pre_post = 4
self.__read_queue = Queue()
self.__write_queue = Queue()
"""
Reads audio from an ALSA audio device into the read queue.
Supposed to run in its own process.
"""
def __read(self):
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(self.__rate)
inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
inp.setperiodsize(self.__rate / 50)
while True:
_, data = inp.read()
self.__read_queue.put(data)
"""
Writes audio to an ALSA audio device from the write queue.
Supposed to run in its own process.
"""
def __write(self):
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(self.__rate)
outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
outp.setperiodsize(self.__rate / 50)
while True:
data = self.__write_queue.get()
outp.write(data)
"""
Pre-post data into the output buffer to avoid buffer underrun.
"""
def __pre_post_data(self):
zeros = np.zeros(self.__rate / 50, dtype = np.uint32)
for i in range(0, self.__pre_post):
self.__write_queue.put(zeros)
"""
Runs the read and write processes.
"""
def run(self):
self.__pre_post_data()
read_process = Process(target = self.__read)
write_process = Process(target = self.__write)
read_process.start()
write_process.start()
"""
Reads audio samples from the queue captured from the reading thread.
"""
def read(self):
return self.__read_queue.get()
"""
Writes audio samples to the queue to be played by the writing thread.
"""
def write(self, data):
self.__write_queue.put(data)
"""
Pseudonymize the audio samples from a binary string into an array of integers.
"""
def pseudonymize(self, s):
return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)
"""
Depseudonymize the audio samples from an array of integers into a binary string.
"""
def depseudonymize(self, a):
s = ""
for elem in a:
s += struct.pack(">I", elem)
return s
"""
Normalize the audio samples from an array of integers into an array of floats with unity level.
"""
def normalize(self, data, max_val):
data = np.array(data)
bias = int(0.5 * max_val)
fac = 1.0 / (0.5 * max_val)
data = fac * (data - bias)
return data
"""
Denormalize the data from an array of floats with unity level into an array of integers.
"""
def denormalize(self, data, max_val):
bias = int(0.5 * max_val)
fac = 0.5 * max_val
data = np.array(data)
data = (fac * data).astype(np.int64) + bias
return data
debug = True
audio = Audio()
audio.run()
while True:
data = audio.read()
pdata = audio.pseudonymize(data)
if debug:
print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
ndata = audio.normalize(pdata, 0xffffffff)
if debug:
print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
#ndata += 0.01 # When I comment in this line, it wreaks complete havoc!
if debug:
print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
pdata = audio.denormalize(ndata, 0xffffffff)
if debug:
print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
print ""
data = audio.depseudonymize(pdata)
audio.write(data)