Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/265.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 使用Microsoft.CognitiveServices.Speech从网络流中获取实时成绩单_C#_Speech Recognition_Microsoft Cognitive - Fatal编程技术网

C# 使用Microsoft.CognitiveServices.Speech从网络流中获取实时成绩单

C# 使用Microsoft.CognitiveServices.Speech从网络流中获取实时成绩单,c#,speech-recognition,microsoft-cognitive,C#,Speech Recognition,Microsoft Cognitive,我们正在计划一个POC,在那里我们向SpeechRecognizer提供一个多播流,比如一个新闻发布会,希望得到一个“实时”的文本,然后我们可以用它进行实时字幕。到目前为止,我看到了这方面的两个挑战: 首先,我不知道如何“抓取”多播流并将其提供给SpeechRecognitor。如果有人愿意分享一个代码示例来演示如何做到这一点(最好是用C#),那将非常有帮助 另一件事与时间有关。我已经使用麦克风输入做了一些初步测试,当语音或多或少是连续的时,服务一次处理相当大的语音块,导致在我得到任何信息之前有

我们正在计划一个POC,在那里我们向SpeechRecognizer提供一个多播流,比如一个新闻发布会,希望得到一个“实时”的文本,然后我们可以用它进行实时字幕。到目前为止,我看到了这方面的两个挑战:

首先,我不知道如何“抓取”多播流并将其提供给SpeechRecognitor。如果有人愿意分享一个代码示例来演示如何做到这一点(最好是用C#),那将非常有帮助

另一件事与时间有关。我已经使用麦克风输入做了一些初步测试,当语音或多或少是连续的时,服务一次处理相当大的语音块,导致在我得到任何信息之前有相当大的延迟,这在实时字幕场景中并不理想。是否有一些设置可以用来更改“粒度”,以便更频繁地返回较小的块(如果有意义的话)


任何和所有的输入都将不胜感激。

对不起,没有多播流的经验

对于语音识别,您可以在连续识别期间订阅最终结果和中间结果。一旦语音识别引擎识别出一段语音,就会创建最终结果。您将更频繁地收到中间识别事件,这些事件将为您提供有关语音识别过程的中间结果。这些可能会在识别过程中发生变化,但随着语音识别过程的进行,它们会变得越来越“稳定”


Wolfgang

如上所述,对于连续语音,您可以订阅
识别
事件以接收对预测语音文本的定期更新。当Azure语音服务确定用户已停止讲话时,
Recognized
事件将触发

例如:

    var microphone = string.IsNullOrEmpty(file);
    var audio = microphone
        ? AudioConfig.FromDefaultMicrophoneInput()
        : AudioConfig.FromWavFileInput(file);

    var config = SpeechConfig.FromSubscription(key, region);
    var recognizer = new SpeechRecognizer(config);

    recognizer.SessionStarted += SessionStarted;
    recognizer.SessionStopped += SessionStopped;
    recognizer.Recognizing += Recognizing;
    recognizer.Recognized += Recognized;
    recognizer.Canceled += Canceled;

    recognizer.StartContinuousRecognitionAsync().Wait();
    if (microphone) { Console.WriteLine("Listening; press ENTER to stop ...\n"); }

    var timeout = _values.GetOrDefault("recognize.timeout", _microphone ? 30000 : int.MaxValue);
    WaitForContinuousStopCancelKeyOrTimeout(recognizer, timeout);

    recognizer.StopContinuousRecognitionAsync().Wait();
使用如下事件处理程序:

    private void Recognizing(object sender, SpeechRecognitionEventArgs e)
    {
        Console.WriteLine($"RECOGNIZING: {e.Result.Text}");
    }

    private void Recognized(object sender, SpeechRecognitionEventArgs e)
    {
        var result = e.Result;
        if (result.Reason == ResultReason.RecognizedSpeech && result.Text.Length != 0)
        {
            Console.WriteLine($"RECOGNIZED: {result.Text}");
            Console.WriteLine();
        }
        else if (result.Reason == ResultReason.NoMatch && _verbose)
        {
            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            Console.WriteLine();
        }
    }
运行时,当我说出短语“我的名字是Rob Chambers,这是语音识别测试”时,输出很快出现(在我说的每个单词700-1000毫秒内):

当我说的几乎是同一个短语,但两个句子之间有一个非常短暂的停顿时,输出如下:

    Listening; press ENTER to stop ...

    RECOGNIZING: my
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name is
    RECOGNIZING: my name is rob
    RECOGNIZING: my name is rob chambers
    RECOGNIZED: My name is Rob Chambers.

    RECOGNIZING: this
    RECOGNIZING: this is a
    RECOGNIZING: this is a test
    RECOGNIZING: this is a test of
    RECOGNIZING: this is a test of speech
    RECOGNIZING: this is a test of speech recognition
    RECOGNIZED: This is a test of speech recognition.

谢谢,沃尔夫冈。是的,我知道中间/最终流程,但我想知道是否有办法获得更小、更频繁的“最终”片段。谢谢Rob!我同意,识别事件发生得很快,但我不太明白我们如何利用这些来制作直播字幕,因为识别事件之间的内容经常会发生显著变化。。。?此外,大写和标点符号仅出现在已识别的事件中。我的假设是,当决定在“已识别”事件中何时返回文本块时,该算法基于暂停识别和缓冲区大小或超时的组合。如果是这样,我希望能够以某种方式更改这些参数,以实现不同的粒度。
    Listening; press ENTER to stop ...

    RECOGNIZING: my
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name is
    RECOGNIZING: my name is rob
    RECOGNIZING: my name is rob chambers
    RECOGNIZED: My name is Rob Chambers.

    RECOGNIZING: this
    RECOGNIZING: this is a
    RECOGNIZING: this is a test
    RECOGNIZING: this is a test of
    RECOGNIZING: this is a test of speech
    RECOGNIZING: this is a test of speech recognition
    RECOGNIZED: This is a test of speech recognition.