首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python在.wav文件中查找特定声音的时间戳

Python在.wav文件中查找特定声音的时间戳
EN

Stack Overflow用户
提问于 2021-05-10 10:04:13
回答 1查看 347关注 0票数 0

我有一个.wav文件,我录下了自己的声音,并说了几分钟。假设我想要找到我在音频中说"Mike“的确切时间。我研究了语音识别,并用Google speech API做了一些测试,但我得到的时间戳远远不准确。

作为另一种选择,我录制了一个非常短的.wav文件,我刚才说的是"Mike“。我正在尝试比较这两个.wav文件,找出在较长的.wav文件中提到"Mike“的每个时间戳。我偶然发现了SleuthEye's神奇的answer

这段代码可以很好地找到一个时间戳,但我不知道如何找到多个开始/结束时间:

代码语言:javascript
复制
import numpy as np
import sys
from scipy.io import wavfile
from scipy import signal

snippet = sys.argv[1]
source  = sys.argv[2]

# read the sample to look for
rate_snippet, snippet = wavfile.read(snippet);
snippet = np.array(snippet, dtype='float')

# read the source
rate, source = wavfile.read(source);
source = np.array(source, dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
  num = int(np.round(rate*len(snippet)/rate_snippet))
  snippet = signal.resample(snippet, num)

# compute the cross-correlation
z = signal.correlate(source, snippet);

peak = np.argmax(np.abs(z))
start = (peak-len(snippet)+1)/rate
end   = peak/rate

print("start {} end {}".format(start, end))

我是超级新手,音频和信号相关的编程,并感谢任何建议。谢谢!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-05-10 18:45:25

你就快到了。您可以使用find_peaks。例如

代码语言:javascript
复制
import numpy as np
from scipy.io import wavfile
from scipy import signal
import matplotlib.pyplot as plt

snippet = 'snippet.wav'
source  = 'source.wav'

# read the sample to look for
rate_snippet, snippet = wavfile.read(snippet);
snippet = np.array(snippet[:,0], dtype='float')

# read the source
rate, source = wavfile.read(source);
source = np.array(source[:,0], dtype='float')

# resample such that both signals are at the same sampling rate (if required)
if rate != rate_snippet:
    num = int(np.round(rate*len(snippet)/rate_snippet))
    snippet = signal.resample(snippet, num)

我的源和代码段

代码语言:javascript
复制
x_snippet = np.arange(0, snippet.size) / rate_snippet

plt.plot(x_snippet, snippet)
plt.xlabel('seconds')
plt.title('snippet')

代码语言:javascript
复制
x_source = np.arange(0, source.size) / rate

plt.plot(x_source, source)
plt.xlabel('seconds')
plt.title('source')

现在我们得到了相关性

代码语言:javascript
复制
# compute the cross-correlation
z = signal.correlate(source, snippet, mode='same')

我使用了mode='same',以便sourcez具有相同的长度

代码语言:javascript
复制
source.size == z.size
True

现在,我们可以定义一个最小山峰高度,例如

代码语言:javascript
复制
x_z = np.arange(0, z.size) / rate

plt.plot(x_z, z)
plt.axhline(2e20, color='r')
plt.title('correlation')

并在最小距离内找到峰值(您可能需要根据您的样本定义自己的heightdistance )

代码语言:javascript
复制
peaks = signal.find_peaks(
    z,
    height=2e20,
    distance=50000
)

peaks
(array([ 117390,  225754,  334405,  449319,  512001,  593854,  750686,
         873026,  942586, 1064083]),
 {'peak_heights': array([8.73666562e+20, 9.32871542e+20, 7.23883305e+20, 9.30772354e+20,
         4.32924341e+20, 9.18323020e+20, 1.12473608e+21, 1.07752019e+21,
         1.12455724e+21, 1.05061734e+21])})

我们取峰值idxs

代码语言:javascript
复制
peaks_idxs = peaks[0]

plt.plot(x_z, z)
plt.plot(x_z[peaks_idxs], z[peaks_idxs], 'or')

由于它们“几乎”在代码片段的中间,我们可以这样做

代码语言:javascript
复制
fig, ax = plt.subplots(figsize=(12, 5))
plt.plot(x_source, source)
plt.xlabel('seconds')
plt.title('source signal and correlatation')
for i, peak_idx in enumerate(peaks_idxs):
    start = (peak_idx-snippet.size/2) / rate
    center = (peak_idx) / rate
    end   = (peak_idx+snippet.size/2) / rate
    plt.axvline(start,  color='g')
    plt.axvline(center, color='y')
    plt.axvline(end,    color='r')
    print(f"peak {i}: start {start:.2f} end {end:.2f}")

peak 0: start 2.34 end 2.98
peak 1: start 4.80 end 5.44
peak 2: start 7.27 end 7.90
peak 3: start 9.87 end 10.51
peak 4: start 11.29 end 11.93
peak 5: start 13.15 end 13.78
peak 6: start 16.71 end 17.34
peak 7: start 19.48 end 20.11
peak 8: start 21.06 end 21.69
peak 9: start 23.81 end 24.45

但也许有一种更好的方法来更精确地定义开始和结束。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67463919

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档