文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Microsoft.CognitiveServices.Speech从网络流中获取实时文字记录

问使用Microsoft.CognitiveServices.Speech从网络流中获取实时文字记录
EN

Stack Overflow用户

提问于 2020-03-23 16:08:00

回答 2查看 153关注 0票数 0

我们正在计划一个POC，在那里我们向SpeechRecognizer提供一个多播流，比方说，一个新闻发布会，希望得到一个“实时”的文字记录，然后我们可以用来进行实时字幕。到目前为止，我看到了两个挑战：

第一个问题是，我不知道如何“抓取”多播流并将其提供给SpeechRecognizer。如果有人愿意分享一个代码示例来说明如何做到这一点(最好是用C#)，那将是非常有帮助的。

另一件事是与时间相关的。我已经使用麦克风输入做了一些初步测试，当语音或多或少是连续的时，服务一次处理相当大的语音块，导致在我得到任何东西之前有相当大的延迟，这在实时字幕场景中不是理想的。有没有一些设置我可以用来改变“粒度”，以便更频繁地获取更小的块(如果这是有意义的)？

任何和所有的意见都将非常感谢。

speech-recognition

microsoft-cognitive

回答 2

Stack Overflow用户

发布于 2020-03-23 17:07:12

对不起，没有使用多播流的经验。

对于语音识别，您可以在连续识别过程中订阅最终结果和中间结果。一旦语音识别引擎识别出一段语音，就会创建最终结果。您将更频繁地收到中间识别事件，这些事件为您提供有关语音识别过程的中间结果。这些可能会在识别过程中发生变化，但您将看到，随着语音识别过程的进行，它们会变得越来越“稳定”。

沃尔夫冈

票数 1

Stack Overflow用户

发布于 2020-03-23 22:39:43

正如沃尔夫冈上面提到的，对于连续语音，您可以订阅Recognizing事件，以接收对预测的语音文本的定期更新。当Azure Speech Service确定用户已停止说话时，将触发Recognized事件。

示例：

    var microphone = string.IsNullOrEmpty(file);
    var audio = microphone
        ? AudioConfig.FromDefaultMicrophoneInput()
        : AudioConfig.FromWavFileInput(file);

    var config = SpeechConfig.FromSubscription(key, region);
    var recognizer = new SpeechRecognizer(config);

    recognizer.SessionStarted += SessionStarted;
    recognizer.SessionStopped += SessionStopped;
    recognizer.Recognizing += Recognizing;
    recognizer.Recognized += Recognized;
    recognizer.Canceled += Canceled;

    recognizer.StartContinuousRecognitionAsync().Wait();
    if (microphone) { Console.WriteLine("Listening; press ENTER to stop ...\n"); }

    var timeout = _values.GetOrDefault("recognize.timeout", _microphone ? 30000 : int.MaxValue);
    WaitForContinuousStopCancelKeyOrTimeout(recognizer, timeout);

    recognizer.StopContinuousRecognitionAsync().Wait();

使用像这样的事件处理程序：

    private void Recognizing(object sender, SpeechRecognitionEventArgs e)
    {
        Console.WriteLine($"RECOGNIZING: {e.Result.Text}");
    }

    private void Recognized(object sender, SpeechRecognitionEventArgs e)
    {
        var result = e.Result;
        if (result.Reason == ResultReason.RecognizedSpeech && result.Text.Length != 0)
        {
            Console.WriteLine($"RECOGNIZED: {result.Text}");
            Console.WriteLine();
        }
        else if (result.Reason == ResultReason.NoMatch && _verbose)
        {
            Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            Console.WriteLine();
        }
    }

当运行时，当我说出短语“我的名字是Rob Chambers，这是语音识别的测试”时，输出出现得非常快(在我说的每个单词的700-1000ms内)：

    Listening; press ENTER to stop ...

    RECOGNIZING: my
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name is rob
    RECOGNIZING: my name is rob chambers
    RECOGNIZING: my name is rob chambers and
    RECOGNIZING: my name is rob chambers and this
    RECOGNIZING: my name is rob chambers and this
    RECOGNIZING: my name is rob chambers and this is
    RECOGNIZING: my name is rob chambers and this is
    RECOGNIZING: my name is rob chambers and this is a
    RECOGNIZING: my name is rob chambers and this is a test
    RECOGNIZING: my name is rob chambers and this is a test of
    RECOGNIZING: my name is rob chambers and this is a test of speech
    RECOGNIZING: my name is rob chambers and this is a test of
    RECOGNIZING: my name is rob chambers and this is a test of speech
    RECOGNIZING: my name is rob chambers and this is a test of speech recognition
    RECOGNIZED: My name is Rob Chambers and this is a test of speech recognition.

当我说出几乎相同的短语，但作为两个句子之间有非常短暂的停顿时，输出如下所示：

    Listening; press ENTER to stop ...

    RECOGNIZING: my
    RECOGNIZING: my name
    RECOGNIZING: my name is
    RECOGNIZING: my name is
    RECOGNIZING: my name is rob
    RECOGNIZING: my name is rob chambers
    RECOGNIZED: My name is Rob Chambers.

    RECOGNIZING: this
    RECOGNIZING: this is a
    RECOGNIZING: this is a test
    RECOGNIZING: this is a test of
    RECOGNIZING: this is a test of speech
    RECOGNIZING: this is a test of speech recognition
    RECOGNIZED: This is a test of speech recognition.

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60809820

复制

相似问题

问使用Microsoft.CognitiveServices.Speech从网络流中获取实时文字记录
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Microsoft.CognitiveServices.Speech从网络流中获取实时文字记录EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Microsoft.CognitiveServices.Speech从网络流中获取实时文字记录
EN