连续语音重编码。使用SFSpeechRecognitor(ios10测试版)
我正在尝试在iOS 10测试版上使用连续语音重编码。使用SFSpeechRecognitor(ios10测试版),ios,swift,beta,ios10,Ios,Swift,Beta,Ios10,我正在尝试在iOS 10测试版上使用AVCapture执行连续语音识别。我已经设置了captureOutput(…)以连续获取CMSampleBuffers。我将这些缓冲区直接放入SFSpeechAudioBufferRecognitionRequest中,我以前是这样设置的: ... do some setup SFSpeechRecognizer.requestAuthorization { authStatus in if authStatus == SFSpeechRecog
AVCapture
执行连续语音识别。我已经设置了captureOutput(…)以连续获取CMSampleBuffers
。我将这些缓冲区直接放入SFSpeechAudioBufferRecognitionRequest
中,我以前是这样设置的:
... do some setup
SFSpeechRecognizer.requestAuthorization { authStatus in
if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized {
self.m_recognizer = SFSpeechRecognizer()
self.m_recognRequest = SFSpeechAudioBufferRecognitionRequest()
self.m_recognRequest?.shouldReportPartialResults = false
self.m_isRecording = true
} else {
print("not authorized")
}
}
.... do further setup
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
if(!m_AV_initialized) {
print("captureOutput(...): not initialized !")
return
}
if(!m_isRecording) {
return
}
let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer)
let mediaType = CMFormatDescriptionGetMediaType(formatDesc!)
if (mediaType == kCMMediaType_Audio) {
// process audio here
m_recognRequest?.appendAudioSampleBuffer(sampleBuffer)
}
return
}
整个过程持续几秒钟。然后不再调用captureOutput。如果我注释掉appendAudioSampleBuffer(sampleBuffer)行,那么只要应用程序运行(如预期的那样),就会调用captureOutput。显然,将样本缓冲区放入语音识别引擎会在某种程度上阻碍进一步的执行。我猜在一段时间后,可用的缓冲区会被消耗掉,并且进程会以某种方式停止,因为它无法获得更多的缓冲区
我应该提到的是,在前2秒钟内记录的所有内容都会导致正确的识别。我只是不知道SFSpeechAPI到底是如何工作的,因为苹果没有在测试版文档中输入任何文本。顺便问一下:如何使用SFSpeechAudioBufferRecognitionRequest.endAudio()
有人知道吗
谢谢
Chris我成功地连续使用SFSpeechRecognitor。 要点是使用AVCaptureSession捕获音频并传输到SpeechRecognizer。 很抱歉,我在Swift方面很差,所以只有ObjC版本 这是我的示例代码(省略一些UI代码,一些重要的代码已标记):
@界面视图控制器()
@属性(非原子,强)AVCaptureSession*捕获;
@属性(非原子,强)SFSpeechAudioBufferRecognitionRequest*speechRequest;
@结束
@实现视图控制器
-(无效)startRecognizer
{
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus){
如果(状态==SFSpeechRecognitzerAuthorizationStatusAuthorizated){
NSLocale*local=[[NSLocale alloc]initWithLocaleIdentifier:@“fr_fr”];
SFSpeechRecognizer*sf=[[SFSpeechRecognizer alloc]initWithLocale:local];
self.speechRequest=[[SFSpeechAudioBufferRecognitionRequest alloc]init];
[sf recognitionTaskWithRequest:self.speechRequest委托人:self];
//应在主队列中调用startCapture方法,否则可能会崩溃
dispatch\u async(dispatch\u get\u main\u queue()^{
[自启动许可];
});
}
}];
}
-(void)端点识别器
{
//结束捕获和结束语音记录
//否则苹果将在30000ms后终止此任务。
[自我捕捉];
[self.speechRequest endAudio];
}
-(无效)开始许可证
{
n错误*错误;
self.capture=[[AVCaptureSession alloc]init];
AVCaptureDevice*audioDev=[AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
if(audioDev==nil){
NSLog(@“无法创建音频捕获设备”);
返回;
}
//创建麦克风设备
AVCaptureDeviceInput*audioIn=[AVCaptureDeviceInputDeviceInputWithDevice:audioDev错误:&error];
如果(错误!=nil){
NSLog(@“无法创建音频输入”);
返回;
}
//在捕获对象中添加麦克风设备
if([self.capture canAddInput:audioIn]==否){
NSLog(@“无法添加音频输入”);
返回;
}
[self.capture addInput:audioIn];
//导出音频数据
AVCaptureAudioDataOutput*音频输出=[[AVCaptureAudioDataOutput alloc]init];
[audioOutput setSampleBufferDelegate:self queue:dispatch_get_main_queue()];
if([self.capture-canAddOutput:audioOutput]==否){
NSLog(@“无法添加音频输出”);
返回;
}
[self.capture addOutput:audioOutput];
[音频输出连接与媒体类型:AVMediaTypeAudio];
[自捕式钻削];
}
-(无效)结束捕获
{
如果(self.capture!=nil&&[self.capture正在运行]){
[自我捕获停止运行];
}
}
-(void)captureOutput:(AVCaptureOutput*)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)SampleBufferfromConnection:(AVCaptureConnection*)连接
{
[self.speechRequest-audiosamplebuffer:sampleBuffer];
}
//一些认可代表
@结束
我将语音识别WWDC开发者talk中的SpeakToMe示例Swift代码转换为Objective-C,它对我很有用。有关Swift,请参见,或有关Objective-C,请参见下文
- (void) viewDidAppear:(BOOL)animated {
_recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
[_recognizer setDelegate:self];
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
switch (authStatus) {
case SFSpeechRecognizerAuthorizationStatusAuthorized:
//User gave access to speech recognition
NSLog(@"Authorized");
break;
case SFSpeechRecognizerAuthorizationStatusDenied:
//User denied access to speech recognition
NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
break;
case SFSpeechRecognizerAuthorizationStatusRestricted:
//Speech recognition restricted on this device
NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
break;
case SFSpeechRecognizerAuthorizationStatusNotDetermined:
//Speech recognition not yet authorized
break;
default:
NSLog(@"Default");
break;
}
}];
audioEngine = [[AVAudioEngine alloc] init];
_speechSynthesizer = [[AVSpeechSynthesizer alloc] init];
[_speechSynthesizer setDelegate:self];
}
-(void)startRecording
{
[self clearLogs:nil];
NSError * outError;
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
[audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:&outError];
request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
inputNode = [audioEngine inputNode];
if (request2 == nil) {
NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
}
if (inputNode == nil) {
NSLog(@"Unable to created a inputNode object");
}
request2.shouldReportPartialResults = true;
_currentTask = [_recognizer recognitionTaskWithRequest:request2
delegate:self];
[inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
NSLog(@"Block tap!");
[request2 appendAudioPCMBuffer:buffer];
}];
[audioEngine prepare];
[audioEngine startAndReturnError:&outError];
NSLog(@"Error %@", outError);
}
- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {
NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
[self log:translatedString];
if ([result isFinal]) {
[audioEngine stop];
[inputNode removeTapOnBus:0];
_currentTask = nil;
request2 = nil;
}
}
下面是@cube答案的Swift(3.0)实现:
import UIKit
import Speech
import AVFoundation
class ViewController: UIViewController {
@IBOutlet weak var console: UITextView!
var capture: AVCaptureSession?
var speechRequest: SFSpeechAudioBufferRecognitionRequest?
override func viewDidLoad() {
super.viewDidLoad()
}
override func viewDidAppear(_ animated: Bool) {
super.viewDidAppear(animated)
startRecognizer()
}
func startRecognizer() {
SFSpeechRecognizer.requestAuthorization { (status) in
switch status {
case .authorized:
let locale = NSLocale(localeIdentifier: "fr_FR")
let sf = SFSpeechRecognizer(locale: locale as Locale)
self.speechRequest = SFSpeechAudioBufferRecognitionRequest()
sf?.recognitionTask(with: self.speechRequest!, delegate: self)
DispatchQueue.main.async {
}
case .denied:
fallthrough
case .notDetermined:
fallthrough
case.restricted:
print("User Autorization Issue.")
}
}
}
func endRecognizer() {
endCapture()
speechRequest?.endAudio()
}
func startCapture() {
capture = AVCaptureSession()
guard let audioDev = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio) else {
print("Could not get capture device.")
return
}
guard let audioIn = try? AVCaptureDeviceInput(device: audioDev) else {
print("Could not create input device.")
return
}
guard true == capture?.canAddInput(audioIn) else {
print("Couls not add input device")
return
}
capture?.addInput(audioIn)
let audioOut = AVCaptureAudioDataOutput()
audioOut.setSampleBufferDelegate(self, queue: DispatchQueue.main)
guard true == capture?.canAddOutput(audioOut) else {
print("Could not add audio output")
return
}
capture?.addOutput(audioOut)
audioOut.connection(withMediaType: AVMediaTypeAudio)
capture?.startRunning()
}
func endCapture() {
if true == capture?.isRunning {
capture?.stopRunning()
}
}
}
extension ViewController: AVCaptureAudioDataOutputSampleBufferDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
speechRequest?.appendAudioSampleBuffer(sampleBuffer)
}
}
extension ViewController: SFSpeechRecognitionTaskDelegate {
func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishRecognition recognitionResult: SFSpeechRecognitionResult) {
console.text = console.text + "\n" + recognitionResult.bestTranscription.formattedString
}
}
别忘了在
info.plist
文件中为NSSpeechRecognitionUsageDescription
添加一个值,否则它会崩溃。事实证明,苹果新的本机语音识别不会自动检测到语音结束时的静音(一个bug?),这对你的情况很有用,因为语音识别活动时间接近一分钟(苹果服务允许的最长时间)。
因此,基本上,如果您需要连续ASR,您必须在代理触发时重新启动语音识别:
func speechRecognitionTask(task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) //wether succesfully= true or not
这是我使用的录音/语音识别SWIFT代码,它工作得非常完美。如果您不需要,请忽略我计算麦克风音量平均功率的部分。我用它来制作波形的动画。不要忘记设置SFSpeechRecognitionTaskDelegate和is委托方法,如果需要额外的代码,请告诉我
func startNativeRecording() throws {
LEVEL_LOWPASS_TRIG=0.01
//Setup Audio Session
node = audioEngine.inputNode!
let recordingFormat = node!.outputFormatForBus(0)
node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
self.nativeASRRequest.appendAudioPCMBuffer(buffer)
//Code to animate a waveform with the microphone volume, ignore if you don't need it:
var inNumberFrames:UInt32 = buffer.frameLength;
var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
var avgValue:Float32 = 0;
vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
//vDSP_maxmgv returns peak values
//vDSP_meamgv returns mean magnitude of a vector
let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
print("AVG. POWER: "+averagePower.description)
dispatch_async(dispatch_get_main_queue(), { () -> Void in
//print("VU: "+vu.description)
var fAvgPwr=CGFloat(averagePower)
print("AvgPwr: "+fAvgPwr.description)
var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
self.waveview.hidden=false
self.waveview.updateWithLevel(waveformFriendlyValue)
})
}
audioEngine.prepare()
try audioEngine.start()
isNativeASRBusy=true
nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
nativeSpeechRecognizer?.delegate=self
//I use this timer to track no speech timeouts, ignore if not neeeded:
self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector: #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
}
func startNativeRecording()抛出{
电平低通触发=0.01
//设置音频会话
node=audioEngine.inputNode!
让recordingFormat=node!.outputFormatForBus(0)
node!.installTapOnBus(0,bufferSize:1024,格式:recordingFormat){(buffer,_)在
self.nativeASRRequest.appendAudioPCMBuffer(缓冲区)
//使用麦克风音量设置波形动画的代码,如果不需要,请忽略:
var inNumberFrames:UInt32=buffer.frameLength;
变量样本:Float32=buffer.floatChannelData[0][0]//https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
var平均值:
func startNativeRecording() throws {
LEVEL_LOWPASS_TRIG=0.01
//Setup Audio Session
node = audioEngine.inputNode!
let recordingFormat = node!.outputFormatForBus(0)
node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
self.nativeASRRequest.appendAudioPCMBuffer(buffer)
//Code to animate a waveform with the microphone volume, ignore if you don't need it:
var inNumberFrames:UInt32 = buffer.frameLength;
var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
var avgValue:Float32 = 0;
vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
//vDSP_maxmgv returns peak values
//vDSP_meamgv returns mean magnitude of a vector
let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
print("AVG. POWER: "+averagePower.description)
dispatch_async(dispatch_get_main_queue(), { () -> Void in
//print("VU: "+vu.description)
var fAvgPwr=CGFloat(averagePower)
print("AvgPwr: "+fAvgPwr.description)
var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
self.waveview.hidden=false
self.waveview.updateWithLevel(waveformFriendlyValue)
})
}
audioEngine.prepare()
try audioEngine.start()
isNativeASRBusy=true
nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
nativeSpeechRecognizer?.delegate=self
//I use this timer to track no speech timeouts, ignore if not neeeded:
self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector: #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
}
@objc func startRecording() {
self.fullsTring = ""
audioEngine.reset()
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(.record)
try audioSession.setMode(.measurement)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
try audioSession.setPreferredSampleRate(44100.0)
if audioSession.isInputGainSettable {
let error : NSErrorPointer = nil
let success = try? audioSession.setInputGain(1.0)
guard success != nil else {
print ("audio error")
return
}
if (success != nil) {
print("\(String(describing: error))")
}
}
else {
print("Cannot set input gain")
}
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
let inputNode = audioEngine.inputNode
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
self.timer4 = Timer.scheduledTimer(timeInterval: TimeInterval(40), target: self, selector: #selector(againStartRec), userInfo: nil, repeats: false)
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error ) in
var isFinal = false //8
if result != nil {
self.timer.invalidate()
self.timer = Timer.scheduledTimer(timeInterval: TimeInterval(2.0), target: self, selector: #selector(self.didFinishTalk), userInfo: nil, repeats: false)
let bestString = result?.bestTranscription.formattedString
self.fullsTring = bestString!
self.inputContainerView.inputTextField.text = result?.bestTranscription.formattedString
isFinal = result!.isFinal
}
if error == nil{
}
if isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
isFinal = false
}
if error != nil{
URLCache.shared.removeAllCachedResponses()
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
guard let task = self.recognitionTask else {
return
}
task.cancel()
task.finish()
}
})
audioEngine.reset()
inputNode.removeTap(onBus: 0)
let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
self.hasrecorded = true
}
@objc func againStartRec(){
self.inputContainerView.uploadImageView.setBackgroundImage( #imageLiteral(resourceName: "microphone") , for: .normal)
self.inputContainerView.uploadImageView.alpha = 1.0
self.timer4.invalidate()
timer.invalidate()
self.timer.invalidate()
if ((self.audioEngine.isRunning)){
self.audioEngine.stop()
self.recognitionRequest?.endAudio()
self.recognitionTask?.finish()
}
self.timer2 = Timer.scheduledTimer(timeInterval: 2, target: self, selector: #selector(startRecording), userInfo: nil, repeats: false)
}
@objc func didFinishTalk(){
if self.fullsTring != ""{
self.timer4.invalidate()
self.timer.invalidate()
self.timer2.invalidate()
if ((self.audioEngine.isRunning)){
self.audioEngine.stop()
guard let task = self.recognitionTask else {
return
}
task.cancel()
task.finish()
}
}
}