文章/答案/技术大牛

发布

社区首页 >问答首页 >Mozilla DeepSpeech:如何从多个分段音频文件生成SRT文件？

问Mozilla DeepSpeech:如何从多个分段音频文件生成SRT文件？
EN

Stack Overflow用户

提问于 2022-04-01 11:56:55

回答 1查看 565关注 0票数 2

我一直在跟踪这个指南，使用Mozilla DeepSpeech从视频/音频文件生成SRT字幕文件。

我已经能够根据使用pyAudioAnalysis库的指南将音频.wav文件的静音部分移到多个分段.wav文件中。

分段音频文件

但是，我目前很难理解如何读取多段文件，并使用Mozilla DeepSpeech生成一个副标题文件。我已经附上了上面分割的音频文件的图像。

至于我当前的代码，大部分类似于我所遵循的指南，但对于函数的解释不够好。

SilenceRemoval函数

from pyAudioAnalysis import audioBasicIO as aIO
from pyAudioAnalysis import audioSegmentation as aS

def silenceRemoval(input_file, smoothing_window = 1.0, weight = 0.2):
    print("Running silenceRemoval function\n")
    [fs, x] = aIO.read_audio_file(input_file)
    segmentLimits = aS.silence_removal(x, fs, 0.05, 0.05, smoothing_window, weight)
    
    for i, s in enumerate(segmentLimits):
        strOut = "{0:s}_{1:.3f}-{2:.3f}.wav".format(input_file[0:-4], s[0], s[1])
        # wavfile.write(strOut, fs, x[int(fs * s[0]):int(fs * s[1])])
        write_file("audio", strOut, ".wav", x[int(fs * s[0]):int(fs * s[1])], fs)
    
    print("\nsilenceRemoval function completed")

将.wav文件写入多段

import os
import scipy.io.wavfile as wavfile

def write_file(output_file_path, input_file_name, name_attribute, sig, fs):
    """
    Read wave file as mono.

    Args:
        - output_file_path (str) : path to save resulting wave file to.
        - input_file_name  (str) : name of processed wave file,
        - name_attribute   (str) : attribute to add to output file name.
        - sig            (array) : signal/audio array.
        - fs               (int) : sampling rate.

    Returns:
        tuple of sampling rate and audio data.
    """
    # set-up the output file name
    fname = os.path.basename(input_file_name).split(".wav")[0] + name_attribute
    fpath = os.path.join(output_file_path, fname)
    wavfile.write(filename=fpath, rate=fs, data=sig)
    print("Writing data to " + fpath + ".")

main()调用函数

video_name = "Videos\MIB_Sample.mp4"
audio_name = video_name + ".wav"

# DeepSpeech Model and Scorer
ds = Model("deepspeech-0.9.3-models.pbmm")
scorer = ds.enableExternalScorer("deepspeech-0.9.3-models.scorer")

def main():     
    # Extract audio from input video file
    extractAudio(video_name, audio_name)

    print("Splitting on silent parts in audio file")
    silenceRemoval(audio_name)

    generateSRT(audio_name)

generateSRT()函数

def generateSRT(audio_file_name):
    command = ["deepspeech", "--model", ds, 
                "--scorer", scorer,
                "--audio", audio_file_name]
    try:
        ret = sp.call(command, shell=True)
        print("generating subtitles")
    except Exception as e:
        print("Error: ", str(e))
        exit(1)

我目前正在尝试从单个提取的音频文件中生成字幕，但是我面临着这个错误

错误:预期的str、字节或os.PathLike对象，而不是模型

希望了解如何使用Mozilla DeepSpeech循环遍历包含要读取和生成的分段音频文件的文件夹，并将其输出到另一个文件夹。谢谢!

mozilla-deepspeech

python

caption

subtitle

回答 1

Stack Overflow用户

发布于 2022-04-04 07:18:27

我将讨论您在这里遇到的具体错误；您链接到的博客文章对使用.srt创建DeepSpeech文件的端到端过程是一个很好的指导。

在您的代码中：

command = ["deepspeech", "--model", ds, 
                "--scorer", scorer,
                "--audio", audio_file_name]

您正在从命令行调用deepspeech二进制文件，并使用变量ds将模型作为参数传递。如果从命令行调用deepspeech，它需要一个文件路径到模型文件( .pbmm文件)所在的位置。

这就是为什么您要接收错误：

Error: expected str, bytes or os.PathLike object, not Model

因为deepspeech二进制文件期望的是文件路径，而不是模型对象。尝试用模型文件的文件路径替换ds，而不是使ds成为一个模型。

有关如何从命令行调用deepspeech的更多信息，请参见文档中的这一页。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/71706546

复制

相似问题

问Mozilla DeepSpeech:如何从多个分段音频文件生成SRT文件？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Mozilla DeepSpeech:如何从多个分段音频文件生成SRT文件？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Mozilla DeepSpeech:如何从多个分段音频文件生成SRT文件？
EN