我与Vosk一起工作,我需要获取中每个单词的时间--这是我的代码
def voice_recognition(filename):
model = Model(model_name="vosk-model-fa-0.5")
rec = KaldiRecognizer(model, FRAME_RATE)
rec.SetWords(True)
mp3 = AudioSegment.from_mp3(filename)
mp3 = mp3.set_channels(CHANNELS)
mp3 = mp3.set_frame_rate(FRAME_RATE)
step = 45000
transcript = ""
for i in range(0, len(mp3), step):
segment = mp3[i:i+step]
rec.AcceptWaveform(segment.raw_data)
result = rec.Result()
text = json.loads(result)["text"]
transcript += text
return transcript我需要这样的东西
time word
-----------------------
(0.0.01, 0.0.2) hi
(0.0.03, 0.0.4) how
(0.0.04, 0.0.5) are
(0.0.05, 0.0.6) you有办法得到这样的数据吗?
发布于 2022-11-16 07:09:01
当您设置rec.SetWords(True)时,我刚刚发现我所需要的所有细节都在result = rec.Result()中
https://stackoverflow.com/questions/74455769
复制相似问题