我们正在计划一个POC,在那里我们向SpeechRecognizer提供一个多播流,比方说,一个新闻发布会,希望得到一个“实时”的文字记录,然后我们可以用来进行实时字幕。到目前为止,我看到了两个挑战:
第一个问题是,我不知道如何“抓取”多播流并将其提供给SpeechRecognizer。如果有人愿意分享一个代码示例来说明如何做到这一点(最好是用C#),那将是非常有帮助的。
另一件事是与时间相关的。我已经使用麦克风输入做了一些初步测试,当语音或多或少是连续的时,服务一次处理相当大的语音块,导致在我得到任何东西之前有相当大的延迟,这在实时字幕场景中不是理想的。有没有一些设置我可以用来改变“粒度”,以便更频繁地获取更小的块(如果这是有意义的)?
任何和所有的意见都将非常感谢。
发布于 2020-03-23 17:07:12
对不起,没有使用多播流的经验。
对于语音识别,您可以在连续识别过程中订阅最终结果和中间结果。一旦语音识别引擎识别出一段语音,就会创建最终结果。您将更频繁地收到中间识别事件,这些事件为您提供有关语音识别过程的中间结果。这些可能会在识别过程中发生变化,但您将看到,随着语音识别过程的进行,它们会变得越来越“稳定”。
沃尔夫冈
发布于 2020-03-23 22:39:43
正如沃尔夫冈上面提到的,对于连续语音,您可以订阅Recognizing事件,以接收对预测的语音文本的定期更新。当Azure Speech Service确定用户已停止说话时,将触发Recognized事件。
示例:
var microphone = string.IsNullOrEmpty(file);
var audio = microphone
? AudioConfig.FromDefaultMicrophoneInput()
: AudioConfig.FromWavFileInput(file);
var config = SpeechConfig.FromSubscription(key, region);
var recognizer = new SpeechRecognizer(config);
recognizer.SessionStarted += SessionStarted;
recognizer.SessionStopped += SessionStopped;
recognizer.Recognizing += Recognizing;
recognizer.Recognized += Recognized;
recognizer.Canceled += Canceled;
recognizer.StartContinuousRecognitionAsync().Wait();
if (microphone) { Console.WriteLine("Listening; press ENTER to stop ...\n"); }
var timeout = _values.GetOrDefault("recognize.timeout", _microphone ? 30000 : int.MaxValue);
WaitForContinuousStopCancelKeyOrTimeout(recognizer, timeout);
recognizer.StopContinuousRecognitionAsync().Wait();使用像这样的事件处理程序:
private void Recognizing(object sender, SpeechRecognitionEventArgs e)
{
Console.WriteLine($"RECOGNIZING: {e.Result.Text}");
}
private void Recognized(object sender, SpeechRecognitionEventArgs e)
{
var result = e.Result;
if (result.Reason == ResultReason.RecognizedSpeech && result.Text.Length != 0)
{
Console.WriteLine($"RECOGNIZED: {result.Text}");
Console.WriteLine();
}
else if (result.Reason == ResultReason.NoMatch && _verbose)
{
Console.WriteLine($"NOMATCH: Speech could not be recognized.");
Console.WriteLine();
}
}当运行时,当我说出短语“我的名字是Rob Chambers,这是语音识别的测试”时,输出出现得非常快(在我说的每个单词的700-1000ms内):
Listening; press ENTER to stop ...
RECOGNIZING: my
RECOGNIZING: my name
RECOGNIZING: my name is
RECOGNIZING: my name
RECOGNIZING: my name is
RECOGNIZING: my name is rob
RECOGNIZING: my name is rob chambers
RECOGNIZING: my name is rob chambers and
RECOGNIZING: my name is rob chambers and this
RECOGNIZING: my name is rob chambers and this
RECOGNIZING: my name is rob chambers and this is
RECOGNIZING: my name is rob chambers and this is
RECOGNIZING: my name is rob chambers and this is a
RECOGNIZING: my name is rob chambers and this is a test
RECOGNIZING: my name is rob chambers and this is a test of
RECOGNIZING: my name is rob chambers and this is a test of speech
RECOGNIZING: my name is rob chambers and this is a test of
RECOGNIZING: my name is rob chambers and this is a test of speech
RECOGNIZING: my name is rob chambers and this is a test of speech recognition
RECOGNIZED: My name is Rob Chambers and this is a test of speech recognition.当我说出几乎相同的短语,但作为两个句子之间有非常短暂的停顿时,输出如下所示:
Listening; press ENTER to stop ...
RECOGNIZING: my
RECOGNIZING: my name
RECOGNIZING: my name is
RECOGNIZING: my name is
RECOGNIZING: my name is rob
RECOGNIZING: my name is rob chambers
RECOGNIZED: My name is Rob Chambers.
RECOGNIZING: this
RECOGNIZING: this is a
RECOGNIZING: this is a test
RECOGNIZING: this is a test of
RECOGNIZING: this is a test of speech
RECOGNIZING: this is a test of speech recognition
RECOGNIZED: This is a test of speech recognition.https://stackoverflow.com/questions/60809820
复制相似问题