检测与诊断；用Python录制音频_Python_Wav_Audio Recording

检测与诊断；用Python录制音频

python

检测与诊断；用Python录制音频,python,wav,audio-recording,Python,Wav,Audio Recording,我需要将音频片段捕获为WAV文件，然后将其传递给另一位python进行处理。问题是，我需要确定何时存在音频，然后录制它，当它变为静音时停止，然后将该文件传递给处理模块我认为wave模块应该可以检测到纯静音，并将其丢弃，然后一旦检测到除静音以外的其他内容开始录制，然后当线路再次静音时停止录制我只是想不通，有人能给我举个基本的例子吗。我相信WAVE模块不支持录制，只支持处理现有文件。你可能想看一看，以便真正录制。 WAV是世界上最简单的文件格式。在paInt16中，您只得到一个表示级别的有符号整

我需要将音频片段捕获为WAV文件，然后将其传递给另一位python进行处理。问题是，我需要确定何时存在音频，然后录制它，当它变为静音时停止，然后将该文件传递给处理模块

我认为wave模块应该可以检测到纯静音，并将其丢弃，然后一旦检测到除静音以外的其他内容开始录制，然后当线路再次静音时停止录制

我只是想不通，有人能给我举个基本的例子吗。

我相信WAVE模块不支持录制，只支持处理现有文件。你可能想看一看，以便真正录制。 WAV是世界上最简单的文件格式。在paInt16中，您只得到一个表示级别的有符号整数，接近0时更安静。我不记得WAV文件是高字节优先还是低字节优先，但类似的东西应该可以工作（对不起，我不是一个真正的python程序员：

from array import array

# you'll probably want to experiment on threshold
# depends how noisy the signal
threshold = 10 
max_value = 0

as_ints = array('h', data)
max_value = max(as_ints)
if max_value > threshold:
    # not silence

PyAudio用于录制的代码保留以供参考：

import pyaudio
import sys

chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5

p = pyaudio.PyAudio()

stream = p.open(format=FORMAT,
                channels=CHANNELS, 
                rate=RATE, 
                input=True,
                output=True,
                frames_per_buffer=chunk)

print "* recording"
for i in range(0, 44100 / chunk * RECORD_SECONDS):
    data = stream.read(chunk)
    # check for silence here by comparing the level with 0 (or some threshold) for 
    # the contents of data.
    # then write data or not to a file

print "* done"

stream.stop_stream()
stream.close()
p.terminate()

您可能还想看看。它有几个API，包括Python。它可能能够与A-D接口交互并收集声音样本。

作为Nick Fortescue回答的后续，下面是一个更完整的示例，说明如何从麦克风录制并处理结果数据：

from sys import byteorder
from array import array
from struct import pack

import pyaudio
import wave

THRESHOLD = 500
CHUNK_SIZE = 1024
FORMAT = pyaudio.paInt16
RATE = 44100

def is_silent(snd_data):
    "Returns 'True' if below the 'silent' threshold"
    return max(snd_data) < THRESHOLD

def normalize(snd_data):
    "Average the volume out"
    MAXIMUM = 16384
    times = float(MAXIMUM)/max(abs(i) for i in snd_data)

    r = array('h')
    for i in snd_data:
        r.append(int(i*times))
    return r

def trim(snd_data):
    "Trim the blank spots at the start and end"
    def _trim(snd_data):
        snd_started = False
        r = array('h')

        for i in snd_data:
            if not snd_started and abs(i)>THRESHOLD:
                snd_started = True
                r.append(i)

            elif snd_started:
                r.append(i)
        return r

    # Trim to the left
    snd_data = _trim(snd_data)

    # Trim to the right
    snd_data.reverse()
    snd_data = _trim(snd_data)
    snd_data.reverse()
    return snd_data

def add_silence(snd_data, seconds):
    "Add silence to the start and end of 'snd_data' of length 'seconds' (float)"
    silence = [0] * int(seconds * RATE)
    r = array('h', silence)
    r.extend(snd_data)
    r.extend(silence)
    return r

def record():
    """
    Record a word or words from the microphone and 
    return the data as an array of signed shorts.

    Normalizes the audio, trims silence from the 
    start and end, and pads with 0.5 seconds of 
    blank sound to make sure VLC et al can play 
    it without getting chopped off.
    """
    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=1, rate=RATE,
        input=True, output=True,
        frames_per_buffer=CHUNK_SIZE)

    num_silent = 0
    snd_started = False

    r = array('h')

    while 1:
        # little endian, signed short
        snd_data = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            snd_data.byteswap()
        r.extend(snd_data)

        silent = is_silent(snd_data)

        if silent and snd_started:
            num_silent += 1
        elif not silent and not snd_started:
            snd_started = True

        if snd_started and num_silent > 30:
            break

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    r = normalize(r)
    r = trim(r)
    r = add_silence(r, 0.5)
    return sample_width, r

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h'*len(data)), *data)

    wf = wave.open(path, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(sample_width)
    wf.setframerate(RATE)
    wf.writeframes(data)
    wf.close()

if __name__ == '__main__':
    print("please speak a word into the microphone")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

从系统导入字节顺序
从数组导入数组
从结构导入包
导入pyaudio
输入波
阈值=500
块大小=1024
格式=pyaudio.paInt16
费率=44100
def静音（snd数据）：
“如果低于“静默”阈值，则返回“True”
返回最大值（snd_数据）<阈值
def正常化（snd_数据）：
“求出卷的平均值”
最大值=16384
时间=浮动（最大）/最大值（snd_数据中i的绝对值（i））
r=数组（'h'）
对于snd_数据中的i：
r、 追加（整数（i*次））
返回r
def微调（snd_数据）：
“在开始和结束时修剪空白点”
def_微调（snd_数据）：
snd_start=False
r=数组（'h'）
对于snd_数据中的i：
如果未启动snd_且abs（i）>阈值：
snd_start=True
r、 附加（i）
elif snd_开始：
r、 附加（i）
返回r
#向左修剪
snd_数据=_修剪（snd_数据）
#向右修剪
snd_data.reverse（）
snd_数据=_修剪（snd_数据）
snd_data.reverse（）
返回snd_数据
def添加_静音（snd_数据，秒）：
“为长度为“秒”（浮动）的“snd_数据”的开始和结束添加静音”
静默=[0]*int（秒*速率）
r=数组（'h'，静音）
r、 扩展（snd_数据）
r、 延长（沉默）
返回r
def记录（）：
"""
从麦克风录制一个或多个单词，然后
将数据作为带符号的短字符数组返回。
使音频正常化，从
开始和结束，并用0.5秒的
空白声音，确保VLC等可以播放
它没有被砍掉。
"""
p=pyaudio.pyaudio（）
流=p.open（格式=格式，通道=1，速率=速率，
输入=真，输出=真，
每个缓冲区的帧数=块大小）
num_silent=0
snd_start=False
r=数组（'h'）
而1：
#小恩迪安，签名很短
snd_data=数组（'h'，stream.read（块大小））
如果字节顺序==“大”：
snd_data.byteswap（）
r、 扩展（snd_数据）
静默=静默（snd\U数据）
如果静音且snd_启动：
num_silent+=1
如果不沉默且未启动，则：
snd_start=True
如果snd_启动且num_silent>30：
打破
样本宽度=p.获取样本大小（格式）
stream.stop_stream（）
stream.close（）
p、 终止（）
r=标准化（r）
r=纵倾（r）
r=加上静音（r，0.5）
返回样本宽度，r
def记录到文件（路径）：
“从麦克风录制并将结果数据输出到‘路径’”
样本宽度，数据=记录（）
data=pack（“pyaudio网站上有许多非常简短和清晰的示例：

2019年12月14日更新-2017年以上链接网站的主要示例：

“”“PyAudio示例：播放波形文件。”“”
导入pyaudio
输入波
导入系统
区块=1024
如果len（系统argv）<2：
打印（“播放波形文件。\n\n用法：%s filename.wav”%sys.argv[0]）
系统出口（-1）
wf=wave.open（sys.argv[1]，'rb'）
p=pyaudio.pyaudio（）
stream=p.open（format=p.get_format_from_width（wf.getsampwidth（）），
channels=wf.getnchannels（），
速率=wf.getframerate（），
输出=真）
数据=wf.readframes（块）
而数据！=''：
stream.write（数据）
数据=wf.readframes（块）
stream.stop_stream（）
stream.close（）
p、 终止（）
感谢cryo提供了改进版，我将测试代码建立在以下基础上：
#Instead of adding silence at start and end of recording (values=0) I add the original audio . This makes audio sound more natural as volume is >0. See trim()
#I also fixed issue with the previous code - accumulated silence counter needs to be cleared once recording is resumed.

from array import array
from struct import pack
from sys import byteorder
import copy
import pyaudio
import wave

THRESHOLD = 500  # audio levels not normalised.
CHUNK_SIZE = 1024
SILENT_CHUNKS = 3 * 44100 / 1024  # about 3sec
FORMAT = pyaudio.paInt16
FRAME_MAX_VALUE = 2 ** 15 - 1
NORMALIZE_MINUS_ONE_dB = 10 ** (-1.0 / 20)
RATE = 44100
CHANNELS = 1
TRIM_APPEND = RATE / 4

def is_silent(data_chunk):
    """Returns 'True' if below the 'silent' threshold"""
    return max(data_chunk) < THRESHOLD

def normalize(data_all):
    """Amplify the volume out to max -1dB"""
    # MAXIMUM = 16384
    normalize_factor = (float(NORMALIZE_MINUS_ONE_dB * FRAME_MAX_VALUE)
                        / max(abs(i) for i in data_all))

    r = array('h')
    for i in data_all:
        r.append(int(i * normalize_factor))
    return r

def trim(data_all):
    _from = 0
    _to = len(data_all) - 1
    for i, b in enumerate(data_all):
        if abs(b) > THRESHOLD:
            _from = max(0, i - TRIM_APPEND)
            break

    for i, b in enumerate(reversed(data_all)):
        if abs(b) > THRESHOLD:
            _to = min(len(data_all) - 1, len(data_all) - 1 - i + TRIM_APPEND)
            break

    return copy.deepcopy(data_all[_from:(_to + 1)])

def record():
    """Record a word or words from the microphone and 
    return the data as an array of signed shorts."""

    p = pyaudio.PyAudio()
    stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, output=True, frames_per_buffer=CHUNK_SIZE)

    silent_chunks = 0
    audio_started = False
    data_all = array('h')

    while True:
        # little endian, signed short
        data_chunk = array('h', stream.read(CHUNK_SIZE))
        if byteorder == 'big':
            data_chunk.byteswap()
        data_all.extend(data_chunk)

        silent = is_silent(data_chunk)

        if audio_started:
            if silent:
                silent_chunks += 1
                if silent_chunks > SILENT_CHUNKS:
                    break
            else: 
                silent_chunks = 0
        elif not silent:
            audio_started = True              

    sample_width = p.get_sample_size(FORMAT)
    stream.stop_stream()
    stream.close()
    p.terminate()

    data_all = trim(data_all)  # we trim before normalize as threshhold applies to un-normalized wave (as well as is_silent() function)
    data_all = normalize(data_all)
    return sample_width, data_all

def record_to_file(path):
    "Records from the microphone and outputs the resulting data to 'path'"
    sample_width, data = record()
    data = pack('<' + ('h' * len(data)), *data)

    wave_file = wave.open(path, 'wb')
    wave_file.setnchannels(CHANNELS)
    wave_file.setsampwidth(sample_width)
    wave_file.setframerate(RATE)
    wave_file.writeframes(data)
    wave_file.close()

if __name__ == '__main__':
    print("Wait in silence to begin recording; wait in silence to terminate")
    record_to_file('demo.wav')
    print("done - result written to demo.wav")

#我添加了原始音频，而不是在录制开始和结束时添加静音（值=0）。这使得音量>0时音频声音更自然。请参见trim（）
#我还修复了以前代码的问题-恢复录制后，需要清除累积静默计数器。
从数组导入数组
从结构导入包
从sys导入字节顺序
导入副本
导入pyaudio
输入波
阈值=500#音频电平未正常化。
块大小=1024
无声块=3*44100/1024#约3秒
格式=pyaudio.paInt16
帧最大值=2**15-1
标准化负一分贝=10**（-1.0/20）
费率=44100
通道=1
微调附加=速率/4
def无提示（数据块）：
“”“如果低于“静默”阈值，则返回“True”“”
返回最大值（数据块）<阈值
def正常化（数据全部）：
“”“将音量放大到最大值-1dB”“”
#最大值=16384
标准化系数=（浮点（标准化系数减去一个系数*帧最大值）
/最大值（数据中i的abs（i））
r=数组（'h'）
对于数据中的i_all：
r、 追加（int（i*标准化因子））
复述
import pyaudio
import wave
from array import array

FORMAT=pyaudio.paInt16
CHANNELS=2
RATE=44100
CHUNK=1024
RECORD_SECONDS=15
FILE_NAME="RECORDING.wav"

audio=pyaudio.PyAudio() #instantiate the pyaudio

#recording prerequisites
stream=audio.open(format=FORMAT,channels=CHANNELS, 
                  rate=RATE,
                  input=True,
                  frames_per_buffer=CHUNK)

#starting recording
frames=[]

for i in range(0,int(RATE/CHUNK*RECORD_SECONDS)):
    data=stream.read(CHUNK)
    data_chunk=array('h',data)
    vol=max(data_chunk)
    if(vol>=500):
        print("something said")
        frames.append(data)
    else:
        print("nothing")
    print("\n")


#end of recording
stream.stop_stream()
stream.close()
audio.terminate()
#writing to file
wavfile=wave.open(FILE_NAME,'wb')
wavfile.setnchannels(CHANNELS)
wavfile.setsampwidth(audio.get_sample_size(FORMAT))
wavfile.setframerate(RATE)
wavfile.writeframes(b''.join(frames))#append frames recorded to file
wavfile.close()