连续语音重编码。使用SFSpeechRecognitor（ios10测试版）_Ios_Swift_Beta_Ios10

连续语音重编码。使用SFSpeechRecognitor（ios10测试版）

ios swift

连续语音重编码。使用SFSpeechRecognitor（ios10测试版）,ios,swift,beta,ios10,Ios,Swift,Beta,Ios10,我正在尝试在iOS 10测试版上使用AVCapture执行连续语音识别。我已经设置了captureOutput（…）以连续获取CMSampleBuffers。我将这些缓冲区直接放入SFSpeechAudioBufferRecognitionRequest中，我以前是这样设置的： ... do some setup SFSpeechRecognizer.requestAuthorization { authStatus in if authStatus == SFSpeechRecog

我正在尝试在iOS 10测试版上使用

AVCapture

执行连续语音识别。我已经设置了captureOutput（…）以连续获取

CMSampleBuffers

。我将这些缓冲区直接放入

SFSpeechAudioBufferRecognitionRequest

中，我以前是这样设置的：

... do some setup
  SFSpeechRecognizer.requestAuthorization { authStatus in
    if authStatus == SFSpeechRecognizerAuthorizationStatus.authorized {
      self.m_recognizer = SFSpeechRecognizer()
      self.m_recognRequest = SFSpeechAudioBufferRecognitionRequest()
      self.m_recognRequest?.shouldReportPartialResults = false
      self.m_isRecording = true
    } else {
      print("not authorized")
    }
  }
.... do further setup


func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {

if(!m_AV_initialized) {
  print("captureOutput(...): not initialized !")
  return
}
if(!m_isRecording) {
  return
}

let formatDesc = CMSampleBufferGetFormatDescription(sampleBuffer)
let mediaType = CMFormatDescriptionGetMediaType(formatDesc!)
if (mediaType == kCMMediaType_Audio) {
  // process audio here
  m_recognRequest?.appendAudioSampleBuffer(sampleBuffer)
}
return
}

整个过程持续几秒钟。然后不再调用captureOutput。如果我注释掉appendAudioSampleBuffer（sampleBuffer）行，那么只要应用程序运行（如预期的那样），就会调用captureOutput。显然，将样本缓冲区放入语音识别引擎会在某种程度上阻碍进一步的执行。我猜在一段时间后，可用的缓冲区会被消耗掉，并且进程会以某种方式停止，因为它无法获得更多的缓冲区

我应该提到的是，在前2秒钟内记录的所有内容都会导致正确的识别。我只是不知道SFSpeechAPI到底是如何工作的，因为苹果没有在测试版文档中输入任何文本。顺便问一下：如何使用SFSpeechAudioBufferRecognitionRequest.endAudio（）

有人知道吗

谢谢

Chris

我成功地连续使用SFSpeechRecognitor。要点是使用AVCaptureSession捕获音频并传输到SpeechRecognizer。很抱歉，我在Swift方面很差，所以只有ObjC版本

这是我的示例代码（省略一些UI代码，一些重要的代码已标记）：

@界面视图控制器（）
@属性（非原子，强）AVCaptureSession*捕获；
@属性（非原子，强）SFSpeechAudioBufferRecognitionRequest*speechRequest；
@结束
@实现视图控制器
-（无效）startRecognizer
{
[SFSpeechRecognizer requestAuthorization:^（SFSpeechRecognizerAuthorizationStatus）{
如果（状态==SFSpeechRecognitzerAuthorizationStatusAuthorizated）{
NSLocale*local=[[NSLocale alloc]initWithLocaleIdentifier:@“fr_fr”]；
SFSpeechRecognizer*sf=[[SFSpeechRecognizer alloc]initWithLocale:local]；
self.speechRequest=[[SFSpeechAudioBufferRecognitionRequest alloc]init]；
[sf recognitionTaskWithRequest:self.speechRequest委托人：self]；
//应在主队列中调用startCapture方法，否则可能会崩溃
dispatch\u async（dispatch\u get\u main\u queue（）^{
[自启动许可]；
});
}
}];
}
-（void）端点识别器
{
//结束捕获和结束语音记录
//否则苹果将在30000ms后终止此任务。
[自我捕捉]；
[self.speechRequest endAudio]；
}
-（无效）开始许可证
{
n错误*错误；
self.capture=[[AVCaptureSession alloc]init]；
AVCaptureDevice*audioDev=[AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio]；
if（audioDev==nil）{
NSLog（@“无法创建音频捕获设备”）；
返回；
}
//创建麦克风设备
AVCaptureDeviceInput*audioIn=[AVCaptureDeviceInputDeviceInputWithDevice:audioDev错误：&error]；
如果（错误！=nil）{
NSLog（@“无法创建音频输入”）；
返回；
}
//在捕获对象中添加麦克风设备
if（[self.capture canAddInput:audioIn]==否）{
NSLog（@“无法添加音频输入”）；
返回；
}
[self.capture addInput:audioIn]；
//导出音频数据
AVCaptureAudioDataOutput*音频输出=[[AVCaptureAudioDataOutput alloc]init]；
[audioOutput setSampleBufferDelegate:self queue:dispatch_get_main_queue（）]；
if（[self.capture-canAddOutput:audioOutput]==否）{
NSLog（@“无法添加音频输出”）；
返回；
}
[self.capture addOutput:audioOutput]；
[音频输出连接与媒体类型：AVMediaTypeAudio]；
[自捕式钻削]；
}
-（无效）结束捕获
{
如果（self.capture！=nil&&[self.capture正在运行]）{
[自我捕获停止运行]；
}
}
-（void）captureOutput:（AVCaptureOutput*）captureOutput didOutputSampleBuffer:（CMSampleBufferRef）SampleBufferfromConnection:（AVCaptureConnection*）连接
{
[self.speechRequest-audiosamplebuffer:sampleBuffer]；
}
//一些认可代表
@结束

我将语音识别WWDC开发者talk中的SpeakToMe示例Swift代码转换为Objective-C，它对我很有用。有关Swift，请参见，或有关Objective-C，请参见下文

- (void) viewDidAppear:(BOOL)animated {

_recognizer = [[SFSpeechRecognizer alloc] initWithLocale:[NSLocale localeWithLocaleIdentifier:@"en-US"]];
[_recognizer setDelegate:self];
[SFSpeechRecognizer requestAuthorization:^(SFSpeechRecognizerAuthorizationStatus authStatus) {
    switch (authStatus) {
        case SFSpeechRecognizerAuthorizationStatusAuthorized:
            //User gave access to speech recognition
            NSLog(@"Authorized");
            break;

        case SFSpeechRecognizerAuthorizationStatusDenied:
            //User denied access to speech recognition
            NSLog(@"SFSpeechRecognizerAuthorizationStatusDenied");
            break;

        case SFSpeechRecognizerAuthorizationStatusRestricted:
            //Speech recognition restricted on this device
            NSLog(@"SFSpeechRecognizerAuthorizationStatusRestricted");
            break;

        case SFSpeechRecognizerAuthorizationStatusNotDetermined:
            //Speech recognition not yet authorized

            break;

        default:
            NSLog(@"Default");
            break;
    }
}];

audioEngine = [[AVAudioEngine alloc] init];
_speechSynthesizer  = [[AVSpeechSynthesizer alloc] init];         
[_speechSynthesizer setDelegate:self];
}


-(void)startRecording
{
[self clearLogs:nil];

NSError * outError;

AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession setCategory:AVAudioSessionCategoryRecord error:&outError];
[audioSession setMode:AVAudioSessionModeMeasurement error:&outError];
[audioSession setActive:true withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation  error:&outError];

request2 = [[SFSpeechAudioBufferRecognitionRequest alloc] init];

inputNode = [audioEngine inputNode];

if (request2 == nil) {
    NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
}

if (inputNode == nil) {

    NSLog(@"Unable to created a inputNode object");
}

request2.shouldReportPartialResults = true;

_currentTask = [_recognizer recognitionTaskWithRequest:request2
                delegate:self];

[inputNode installTapOnBus:0 bufferSize:4096 format:[inputNode outputFormatForBus:0] block:^(AVAudioPCMBuffer *buffer, AVAudioTime *when){
    NSLog(@"Block tap!");

    [request2 appendAudioPCMBuffer:buffer];

}];

    [audioEngine prepare];
    [audioEngine startAndReturnError:&outError];
    NSLog(@"Error %@", outError);
}

- (void)speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition:(SFSpeechRecognitionResult *)result {

NSLog(@"speechRecognitionTask:(SFSpeechRecognitionTask *)task didFinishRecognition");
NSString * translatedString = [[[result bestTranscription] formattedString] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

[self log:translatedString];

if ([result isFinal]) {
    [audioEngine stop];
    [inputNode removeTapOnBus:0];
    _currentTask = nil;
    request2 = nil;
}
}

下面是@cube答案的Swift（3.0）实现：

import UIKit
import Speech
import AVFoundation


class ViewController: UIViewController  {
  @IBOutlet weak var console: UITextView!

  var capture: AVCaptureSession?
  var speechRequest: SFSpeechAudioBufferRecognitionRequest?
  override func viewDidLoad() {
    super.viewDidLoad()
  }
  override func viewDidAppear(_ animated: Bool) {
    super.viewDidAppear(animated)
    startRecognizer()
  }

  func startRecognizer() {
    SFSpeechRecognizer.requestAuthorization { (status) in
      switch status {
      case .authorized:
        let locale = NSLocale(localeIdentifier: "fr_FR")
        let sf = SFSpeechRecognizer(locale: locale as Locale)
        self.speechRequest = SFSpeechAudioBufferRecognitionRequest()
        sf?.recognitionTask(with: self.speechRequest!, delegate: self)
        DispatchQueue.main.async {

        }
      case .denied:
        fallthrough
      case .notDetermined:
        fallthrough
      case.restricted:
        print("User Autorization Issue.")
      }
    }

  }

  func endRecognizer() {
    endCapture()
    speechRequest?.endAudio()
  }

  func startCapture() {

    capture = AVCaptureSession()

    guard let audioDev = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeAudio) else {
      print("Could not get capture device.")
      return
    }

    guard let audioIn = try? AVCaptureDeviceInput(device: audioDev) else {
      print("Could not create input device.")
      return
    }

    guard true == capture?.canAddInput(audioIn) else {
      print("Couls not add input device")
      return
    }

    capture?.addInput(audioIn)

    let audioOut = AVCaptureAudioDataOutput()
    audioOut.setSampleBufferDelegate(self, queue: DispatchQueue.main)

    guard true == capture?.canAddOutput(audioOut) else {
      print("Could not add audio output")
      return
    }

    capture?.addOutput(audioOut)
    audioOut.connection(withMediaType: AVMediaTypeAudio)
    capture?.startRunning()


  }

  func endCapture() {

    if true == capture?.isRunning {
      capture?.stopRunning()
    }
  }
}

extension ViewController: AVCaptureAudioDataOutputSampleBufferDelegate {
  func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
    speechRequest?.appendAudioSampleBuffer(sampleBuffer)
  }

}

extension ViewController: SFSpeechRecognitionTaskDelegate {

  func speechRecognitionTask(_ task: SFSpeechRecognitionTask, didFinishRecognition recognitionResult: SFSpeechRecognitionResult) {
    console.text = console.text + "\n" + recognitionResult.bestTranscription.formattedString
  }
}

别忘了在

info.plist

文件中为NSSpeechRecognitionUsageDescription
添加一个值，否则它会崩溃。

事实证明，苹果新的本机语音识别不会自动检测到语音结束时的静音（一个bug？），这对你的情况很有用，因为语音识别活动时间接近一分钟（苹果服务允许的最长时间）。因此，基本上，如果您需要连续ASR，您必须在代理触发时重新启动语音识别：

func speechRecognitionTask(task: SFSpeechRecognitionTask, didFinishSuccessfully successfully: Bool) //wether succesfully= true or not

这是我使用的录音/语音识别SWIFT代码，它工作得非常完美。如果您不需要，请忽略我计算麦克风音量平均功率的部分。我用它来制作波形的动画。不要忘记设置SFSpeechRecognitionTaskDelegate和is委托方法，如果需要额外的代码，请告诉我

func startNativeRecording() throws {
        LEVEL_LOWPASS_TRIG=0.01
        //Setup Audio Session
        node = audioEngine.inputNode!
        let recordingFormat = node!.outputFormatForBus(0)
        node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
            self.nativeASRRequest.appendAudioPCMBuffer(buffer)

 //Code to animate a waveform with the microphone volume, ignore if you don't need it:
            var inNumberFrames:UInt32 = buffer.frameLength;
            var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
            var avgValue:Float32 = 0;
            vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
            //vDSP_maxmgv returns peak values
            //vDSP_meamgv returns mean magnitude of a vector

            let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
            var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
            print("AVG. POWER: "+averagePower.description)
            dispatch_async(dispatch_get_main_queue(), { () -> Void in
                //print("VU: "+vu.description)
                var fAvgPwr=CGFloat(averagePower)
                print("AvgPwr: "+fAvgPwr.description)

                var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
                if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
                self.waveview.hidden=false
                self.waveview.updateWithLevel(waveformFriendlyValue)
            })
        }
        audioEngine.prepare()
        try audioEngine.start()
        isNativeASRBusy=true
        nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
        nativeSpeechRecognizer?.delegate=self
  //I use this timer to track no speech timeouts, ignore if not neeeded:
        self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector:  #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
    }

func startNativeRecording（）抛出{
电平低通触发=0.01
//设置音频会话
node=audioEngine.inputNode！
让recordingFormat=node！.outputFormatForBus（0）
node！.installTapOnBus（0，bufferSize:1024，格式：recordingFormat）{（buffer，_）在
self.nativeASRRequest.appendAudioPCMBuffer（缓冲区）
//使用麦克风音量设置波形动画的代码，如果不需要，请忽略：
var inNumberFrames:UInt32=buffer.frameLength；
变量样本：Float32=buffer.floatChannelData[0][0]//https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
var平均值：
func startNativeRecording() throws {
        LEVEL_LOWPASS_TRIG=0.01
        //Setup Audio Session
        node = audioEngine.inputNode!
        let recordingFormat = node!.outputFormatForBus(0)
        node!.installTapOnBus(0, bufferSize: 1024, format: recordingFormat){(buffer, _) in
            self.nativeASRRequest.appendAudioPCMBuffer(buffer)

 //Code to animate a waveform with the microphone volume, ignore if you don't need it:
            var inNumberFrames:UInt32 = buffer.frameLength;
            var samples:Float32 = buffer.floatChannelData[0][0]; //https://github.com/apple/swift-evolution/blob/master/proposals/0107-unsaferawpointer.md
            var avgValue:Float32 = 0;
            vDSP_maxmgv(buffer.floatChannelData[0], 1, &avgValue, vDSP_Length(inNumberFrames)); //Accelerate Framework
            //vDSP_maxmgv returns peak values
            //vDSP_meamgv returns mean magnitude of a vector

            let avg3:Float32=((avgValue == 0) ? (0-100) : 20.0)
            var averagePower=(self.LEVEL_LOWPASS_TRIG*avg3*log10f(avgValue)) + ((1-self.LEVEL_LOWPASS_TRIG)*self.averagePowerForChannel0) ;
            print("AVG. POWER: "+averagePower.description)
            dispatch_async(dispatch_get_main_queue(), { () -> Void in
                //print("VU: "+vu.description)
                var fAvgPwr=CGFloat(averagePower)
                print("AvgPwr: "+fAvgPwr.description)

                var waveformFriendlyValue=0.5+fAvgPwr //-0.5 is AvgPwrValue when user is silent
                if(waveformFriendlyValue<0){waveformFriendlyValue=0} //round values <0 to 0
                self.waveview.hidden=false
                self.waveview.updateWithLevel(waveformFriendlyValue)
            })
        }
        audioEngine.prepare()
        try audioEngine.start()
        isNativeASRBusy=true
        nativeASRTask = nativeSpeechRecognizer?.recognitionTaskWithRequest(nativeASRRequest, delegate: self)
        nativeSpeechRecognizer?.delegate=self
  //I use this timer to track no speech timeouts, ignore if not neeeded:
        self.endOfSpeechTimeoutTimer = NSTimer.scheduledTimerWithTimeInterval(utteranceTimeoutSeconds, target: self, selector:  #selector(ViewController.stopNativeRecording), userInfo: nil, repeats: false)
    }

@objc  func startRecording() {
    
    self.fullsTring = ""
    audioEngine.reset()
    
    if recognitionTask != nil {
        recognitionTask?.cancel()
        recognitionTask = nil
    }
    
    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(.record)
        try audioSession.setMode(.measurement)
        try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
        try audioSession.setPreferredSampleRate(44100.0)
        
        if audioSession.isInputGainSettable {
            let error : NSErrorPointer = nil
            
            let success = try? audioSession.setInputGain(1.0)
            
            guard success != nil else {
                print ("audio error")
                return
            }
            if (success != nil) {
                print("\(String(describing: error))")
            }
        }
        else {
            print("Cannot set input gain")
        }
    } catch {
        print("audioSession properties weren't set because of an error.")
    }
    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    
    let inputNode = audioEngine.inputNode
    guard let recognitionRequest = recognitionRequest else {
        fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
    }
    
    recognitionRequest.shouldReportPartialResults = true
    self.timer4 = Timer.scheduledTimer(timeInterval: TimeInterval(40), target: self, selector: #selector(againStartRec), userInfo: nil, repeats: false)
    
    recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest, resultHandler: { (result, error ) in
        
        var isFinal = false  //8
        
        if result != nil {
            self.timer.invalidate()
            self.timer = Timer.scheduledTimer(timeInterval: TimeInterval(2.0), target: self, selector: #selector(self.didFinishTalk), userInfo: nil, repeats: false)
            
            let bestString = result?.bestTranscription.formattedString
            self.fullsTring = bestString!
            
            self.inputContainerView.inputTextField.text = result?.bestTranscription.formattedString
            
            isFinal = result!.isFinal
            
        }
        if error == nil{
            
        }
        if  isFinal {
            
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            
            self.recognitionRequest = nil
            self.recognitionTask = nil
            isFinal = false
            
        }
        if error != nil{
            URLCache.shared.removeAllCachedResponses()
            
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)
            
            guard let task = self.recognitionTask else {
                return
            }
            task.cancel()
            task.finish()
        }
    })
    audioEngine.reset()
    inputNode.removeTap(onBus: 0)
    
    let recordingFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 1)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest?.append(buffer)
    }
    
    audioEngine.prepare()
    
    do {
        try audioEngine.start()
    } catch {
        print("audioEngine couldn't start because of an error.")
    }
    
    self.hasrecorded = true
}

@objc func againStartRec(){
    
    self.inputContainerView.uploadImageView.setBackgroundImage( #imageLiteral(resourceName: "microphone") , for: .normal)
    self.inputContainerView.uploadImageView.alpha = 1.0
    self.timer4.invalidate()
    timer.invalidate()
    self.timer.invalidate()
    
    if ((self.audioEngine.isRunning)){
        
        self.audioEngine.stop()
        self.recognitionRequest?.endAudio()
        self.recognitionTask?.finish()
    }
    self.timer2 = Timer.scheduledTimer(timeInterval: 2, target: self, selector: #selector(startRecording), userInfo: nil, repeats: false)
}

@objc func didFinishTalk(){
    
    if self.fullsTring != ""{
        
        self.timer4.invalidate()
        self.timer.invalidate()
        self.timer2.invalidate()
        
        if ((self.audioEngine.isRunning)){
            self.audioEngine.stop()
            guard let task = self.recognitionTask else {
                return
            }
            task.cancel()
            task.finish()
        }
    }
}