首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在VOSK语音识别中使用Wave文件作为输入?

如何在VOSK语音识别中使用Wave文件作为输入?
EN

Stack Overflow用户
提问于 2021-06-29 17:07:14
回答 1查看 367关注 0票数 1

我有一个项目,需要获得一个录制的文件,然后按代码处理,从文件中提取文本,并将提取的文件与其他文本匹配,并验证它。我的问题是:我不能在代码中使用录制的文件,它也不能读取文件

初始化函数是代码的基础。

确认功能确认匹配的语音和文本。

代码语言:javascript
复制
import argparse
import json
import os
import queue
import random
import sys
from difflib import SequenceMatcher
import numpy as np
import sounddevice as sd
import vosk

q = queue.Queue()

def int_or_str(text):
    """Helper function for argument parsing."""
    try:
        return int(text)
    except ValueError:
        return text


def callback(indata, frames, time, status):
    """This is called (from a separate thread) for each audio block."""
    if status:
        print(status, file=sys.stderr)
    q.put(bytes(indata))



def init():
    parser = argparse.ArgumentParser(add_help=False)
    parser.add_argument(
        '-l', '--list-devices', action='store_true',
        help='show list of audio devices and exit')
    args, remaining = parser.parse_known_args()
    if args.list_devices:
        print(sd.query_devices())
        parser.exit(0)
    parser = argparse.ArgumentParser(
        description=__doc__,
        formatter_class=argparse.RawDescriptionHelpFormatter,
        parents=[parser])
    parser.add_argument(
        '-f', '--filename', type=str, metavar='FILENAME',
        help='audio file to store recording to')
    parser.add_argument(
        '-m', '--model', type=str, metavar='MODEL_PATH',
        help='Path to the model')
    parser.add_argument(
        '-d', '--device', type=int_or_str,
        help='input device (numeric ID or substring)')
    parser.add_argument(
        '-r', '--samplerate', type=int, help='sampling rate')
    args = parser.parse_args(remaining)
    try:
        if args.model is None:
            args.model = "model"
        if not os.path.exists(args.model):
            print("Please download a model for your language from https://alphacephei.com/vosk/models")
            print("and unpack as 'model' in the current folder.")
            parser.exit(0)
        if args.samplerate is None:
            device_info = sd.query_devices(args.device, 'input')
            # soundfile expects an int, sounddevice provides a float:
            args.samplerate = int(device_info['default_samplerate'])

        model = vosk.Model(args.model)

        if args.filename:
            dump_fn = open(args.filename, "wb")
        else:
            dump_fn = None

        
    except KeyboardInterrupt:
        print('\nDone')
        parser.exit(0)
    except Exception as e:
        parser.exit(type(e).__name__ + ': ' + str(e))

    return model, args
def verify(random_sentence, model, args):
    num, T_num, F_num, num_word = 0, 0, 0, 1
    with sd.RawInputStream(samplerate=args.samplerate, blocksize=8000, device=args.device, dtype='int16',
                           channels=1, callback=callback):
        rec = vosk.KaldiRecognizer(model, args.samplerate)
        print("{}) ".format(num_word), random_sentence, end='\n')
        print('=' * 30, end='\n')
        run = True
        while run:
            data = q.get()
            if rec.AcceptWaveform(data):
                res = json.loads(rec.FinalResult())
                res['text'] = res['text'].replace('ي', 'ی')             
                if SequenceMatcher(None, random_sentence, res['text']).ratio() > 0.65:                    
                    T_num, num, num_word += 1
                    
                else:
                    F_num, num, num_word += 1
                    
                run = False

    print('=' * 30)
    print('True Cases : {}\n False Cases : {}'.format(T_num, F_num))


if __name__ == "__main__":
    model, args = init()
    verify(random_sentences, model, args)
EN

回答 1

Stack Overflow用户

发布于 2021-09-07 07:33:05

我一直在做一个类似的项目。我修改了the code from VOSK Git repo并编写了以下函数,该函数接受文件名/路径作为输入,并输出捕获的文本。有时,当音频文件中有很长的停顿(~秒)时,返回的文本会是一个空字符串。为了解决这个问题,我不得不编写额外的代码来挑选出捕获的最长字符串。我可以凑合着用这个方法。

代码语言:javascript
复制
def get_text_from_voice(filename):

    if not os.path.exists("model"):
        print ("Please download the model from https://alphacephei.com/vosk/models and unpack as 'model' in the current folder.")
        exit (1)

    wf = wave.open(filename, "rb")
    if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
        print ("Audio file must be WAV format mono PCM.")
        exit (1)

    model = Model("model")
    rec = KaldiRecognizer(model, wf.getframerate())
    rec.SetWords(True)

    text_lst =[]
    p_text_lst = []
    p_str = []
    len_p_str = []
    while True:
        data = wf.readframes(4000)
        if len(data) == 0:
            break
        if rec.AcceptWaveform(data):
            text_lst.append(rec.Result())
            print(rec.Result())
        else:
            p_text_lst.append(rec.PartialResult())
            print(rec.PartialResult())

    if len(text_lst) !=0:
        jd = json.loads(text_lst[0])
        txt_str = jd["text"]
        
    elif len(p_text_lst) !=0: 
        for i in range(0,len(p_text_lst)):
            temp_txt_dict = json.loads(p_text_lst[i])
            p_str.append(temp_txt_dict['partial'])
       
        len_p_str = [len(p_str[j]) for j in range(0,len(p_str))]
        max_val = max(len_p_str)
        indx = len_p_str.index(max_val)
        txt_str = p_str[indx]
            
    else:
        txt_str =''

    return txt_str

确保正确的模型存在于同一目录中,或者放在模型的路径中。另外,请注意,VOSK只接受wav mono PCM格式的音频文件。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68175694

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档