我有一个Python脚本,它使用speech_recognition包来识别语音并返回所说内容的文本。然而,转录有几秒钟的延迟。有没有另一种方法来编写这个脚本,以便在说出每个单词时都返回它?我有另一个脚本可以做到这一点,使用的是pysphinx包,但是结果非常不准确。
安装依赖项:
pip install SpeechRecognition
pip install pocketsphinx脚本1-延迟语音转文本:
import speech_recognition as sr
# obtain audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Please wait. Calibrating microphone...")
# listen for 5 seconds and create the ambient noise energy level
r.adjust_for_ambient_noise(source, duration=5)
print("Say something!")
audio = r.listen(source)
# recognize speech using Sphinx
try:
print("Sphinx thinks you said '" + r.recognize_sphinx(audio) + "'")
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))脚本2-即时语音到文本,尽管不准确:
import os
from pocketsphinx import LiveSpeech, get_model_path
model_path = get_model_path()
speech = LiveSpeech(
verbose=False,
sampling_rate=16000,
buffer_size=2048,
no_search=False,
full_utt=False,
hmm=os.path.join(model_path, 'en-us'),
lm=os.path.join(model_path, 'en-us.lm.bin'),
dic=os.path.join(model_path, 'cmudict-en-us.dict')
)
for phrase in speech:
print(phrase)发布于 2020-01-25 03:49:45
如果你碰巧有一台支持CUDA的图形处理器,那么你可以试试Mozilla的DeepSpeech图形处理器库。他们也有一个CPU版本,以防你没有支持CUDA的GPU。中央处理器转录一个音频文件使用DeepSpeech在1.3倍的时间,而在图形处理器上,速度是0.3倍,即它转录一个1秒的音频文件在0.33秒。快速入门:
# Create and activate a virtualenv
virtualenv -p python3 $HOME/tmp/deepspeech-gpu-venv/
source $HOME/tmp/deepspeech-gpu-venv/bin/activate
# Install DeepSpeech CUDA enabled package
pip3 install deepspeech-gpu
# Transcribe an audio file.
deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-
0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-
3980-0043.wav一些重要的注意事项-深度演讲-gpu有一些依赖,如tensorflow,CUDA,cuDNN等,所以请查看他们的github repo以了解更多详细信息-https://github.com/mozilla/DeepSpeech
https://stackoverflow.com/questions/47004955
复制相似问题