文章/答案/技术大牛

发布

社区首页 >问答首页 >谷歌转录中.flac文件的RecognitionConfig错误

问谷歌转录中.flac文件的RecognitionConfig错误
EN

Stack Overflow用户

提问于 2020-06-27 07:35:58

回答 1查看 1.9K关注 0票数 0

我正在尝试用谷歌云转录一个音频文件。下面是我的代码：

from google.cloud.speech_v1 import enums
from google.cloud import speech_v1p1beta1
import os
import io


def sample_long_running_recognize(local_file_path):

    client = speech_v1p1beta1.SpeechClient()

    # local_file_path = 'resources/commercial_mono.wav'

    # If enabled, each word in the first alternative of each result will be
    # tagged with a speaker tag to identify the speaker.
    enable_speaker_diarization = True

    # Optional. Specifies the estimated number of speakers in the conversation.
    diarization_speaker_count = 2

    # The language of the supplied audio
    language_code = "en-US"
    config = {
        "enable_speaker_diarization": enable_speaker_diarization,
        "diarization_speaker_count": diarization_speaker_count,
        "language_code": language_code,
        "encoding": enums.RecognitionConfig.AudioEncoding.FLAC
    }
    with io.open(local_file_path, "rb") as f:
        content = f.read()
    audio = {"content": content}
    # audio = {"uri": storage_uri}


    operation = client.long_running_recognize(config, audio)

    print(u"Waiting for operation to complete...")
    response = operation.result()

    for result in response.results:
        # First alternative has words tagged with speakers
        alternative = result.alternatives[0]
        print(u"Transcript: {}".format(alternative.transcript))
        # Print the speaker_tag of each word
        for word in alternative.words:
            print(u"Word: {}".format(word.word))
            print(u"Speaker tag: {}".format(word.speaker_tag))


sample_long_running_recognize('/Users/asi/Downloads/trimmed_3.flac')

我一直收到这个错误：

google.api_core.exceptions.InvalidArgument: 400 audio_channel_count `1` in RecognitionConfig must either be unspecified or match the value in the FLAC header `2`.

我不知道我做错了什么。我从google cloud speech API文档中复制并粘贴了很多东西。有什么建议吗？

google-cloud-speech

google-speech-to-text-api

python

google-cloud-platform

google-speech-api

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-01 15:49:56

此属性(audio_channel_count)是输入音频数据中的通道数，您只需设置此属性即可进行多通道识别。我假设这是您的情况，所以正如消息所暗示的那样，您需要在配置中设置'audio_channel_count' : 2以与您的音频文件完全匹配。

有关RecognitionConfig对象属性的更多信息，请查看source code。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62604277

复制

相似问题

问谷歌转录中.flac文件的RecognitionConfig错误
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问谷歌转录中.flac文件的RecognitionConfig错误EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问谷歌转录中.flac文件的RecognitionConfig错误
EN