文章/答案/技术大牛

发布

社区首页 >问答首页 >利用语音活动检测器(VAD)实现语音检测

问利用语音活动检测器(VAD)实现语音检测
EN

Stack Overflow用户

提问于 2021-04-30 11:13:52

回答 2查看 348关注 0票数 0

我能够读取音频，但当我将它传递给VAD(语音活动检测器)时，我收到了一条错误消息。我认为错误消息是因为帧是以字节为单位的，当将它输入到vad.is_speech( frame，sample_rate)时，这个帧应该是字节吗？以下代码如下：

frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
 return frame_stored
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

speech_frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

这里是错误消息：

TypeError跟踪(最近一次调用)在16 speech_frame = [] 17中用于帧中的帧：--> 18 is_speech = vad.is_speech( frame，sample_rate) 19 #print(帧)

C:\Program \Python38\lib\site-packages\webrtcvad.py in is_speech(self，buf，sample_rate，length) 2021 def is_speech(self，buf，sample_rate，length=None)：--> 22 length = length (len(Buf)/ 2) 23 if (Buf)*2> len(buf)：24 self(Buf)

TypeError：'int‘类型的对象没有len()

python

pyaudio

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-05-02 10:13:07

我已经解决了它，您知道vad.is_speech(buf=frame, sample_rate)，它接受buf并计算它的长度，但是一个整数值不具有python中的len()属性。这会引发一个错误，例如：

num = 1
print(len(num))

用这个代替：

data = [1,2,3,4]
print(len(data))

下面是对代码的更正：

frame_duration_ms=10
duration_in_ms = (frame_duration_ms / 1000) #duration in 10ms
frame_size = int(sample_rate * duration_in_ms) #frame size of 160
frame_bytes = frame_size * 2

values = []

def frame_generator(buffer, frame_bytes):
    # repeatedly store 320 length array to the frame_stored when the frame_bytes is less than the size of the buffer
    while offset+frame_bytes < len(buffer):
        frame_stored = buffer[offset : offset+frame_bytes]
        offset = offset + frame_bytes
        values.append(frame_stored)
 return values
num_padding_frames = int(padding_duration_ms / frame_duration_ms)
# use deque for the sliding window
ring_buffer = deque(maxlen=num_padding_frames)
# we have two states TRIGGERED and NOTTRIGGERED state
triggered = True #NOTTRIGGERED state

frames = frame_generator(buffer, frame_bytes)

frame = []
for frame in frames:
    is_speech = vad.is_speech(frame, sample_rate)

票数 0

Stack Overflow用户

发布于 2022-08-16 18:44:27

import wave
import webrtcvad
# Initialize a vad object
audioFile = wave.open('ENG_M.wav')
framesAudio = audioFile.readframes(800)
#print(fraud.frames)

vad = webrtcvad.Vad()
# Run the VAD on 10 ms of silence and 16000 sampling rate 
sample_rate = 16000
frame_duration = 10  # in ms
for f in framesAudio :
    # Detecting speech
    final_frame = f.to_bytes(2,"big")* int(sample_rate * frame_duration / 1000)
    print(f'Contains speech: {vad.is_speech(final_frame, sample_rate)}')

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67332920

复制

相似问题

问利用语音活动检测器(VAD)实现语音检测
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问利用语音活动检测器(VAD)实现语音检测EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问利用语音活动检测器(VAD)实现语音检测
EN