声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

虚幻大学 xuhss 315℃ 0评论

? 优质资源分享 ?

学习路线指引(点击解锁) 知识定位 人群定位
? Python实战微信订餐小程序 ? 进阶级 本课程是python flask+微信小程序的完美结合,从项目搭建到腾讯云部署上线,打造一个全栈订餐系统。
?Python量化交易实战? 入门级 手把手带你打造一个易扩展、更安全、效率更高的量化交易系统

目录

梅尔刻度mel 滤波器组mel 滤波器组特征巴克刻度Bark 滤波器组Bark频谱等效矩阵带宽线性滤波器组Gammatone 滤波器组参考

梅尔刻度

  梅尔刻度(Mel scale)是一种由听众判断不同频率 音高(pitch)彼此相等的感知刻度,表示人耳对等距音高(pitch)变化的感知。mel 刻度和正常频率(Hz)之间的参考点是将1 kHz,且高于人耳听阈值40分贝以上的基音,定为1000 mel。在大约500 Hz以上,听者判断越来越大的音程(interval)产生相等的pitch增量,人耳每感觉到等量的音高变化,所需要的频率变化随频率增加而愈来愈大。

  将频率fff (Hz)转换为梅尔mmm的公式是

m=2595log10(1+f700)m=2595log10⁡(1+f700)m=2595\log_{10}(1+\frac{f}{700})

def hz2mel(hz):
 """ Hz to Mels """
    return 2595 * np.log10(1 + hz / 700.0)

1b5d9a4c53671bb41745c7bd010b32cb - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

mel与f(Hz)的对应关系

6a4ff6ba61a3db0666ade25d019b917e - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)e8945612f3c53e3374e19586db13fc67 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

def hz2mel(hz):
 """ Hz to Mels """
    return 2595 * np.log10(1 + hz / 700.0)

if \_\_name\_\_ == "\_\_main\_\_":
 fs = 16000
 hz = np.linspace(0, 8000, 8000)
 mel = hz2mel(hz)

 fig = plt.figure()
 ax = plt.plot(hz, mel, color="r")

 plt.xlabel("Hertz scale (Hz)", fontsize=12)  # x轴的名字
    plt.ylabel("mel scale", fontsize=12)

 plt.xticks(fontsize=10)  # x轴的刻度
    plt.yticks(fontsize=10)

 plt.xlim(0, 8000)  # 坐标轴的范围
 plt.ylim(0)

 def formatnum(x, pos):
 return '$%.1f$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 # plt.gca().xaxis.set\_major\_formatter(formatter)
    # plt.gca().yaxis.set\_major\_formatter(formatter)
    plt.grid(linestyle='--')
 plt.tight\_layout()
 plt.show()

画图代码
将梅尔mmm转换为频率fff (Hz)的公式是:

f=700em2595−1f=700em2595−1f=700e^{\frac{m}{2595}-1}

def mel2hz(mel):
 """ Mels to HZ """
    return 700 * (10 ** (mel / 2595.0) - 1)

mel 滤波器组

def mel\_filterbanks(nfilt=20, nfft=512, samplerate=16000, lowfreq=0, highfreq=None):
 """计算一个Mel-filterbank (M,F)
 :param nfilt: filterbank中的滤波器数量
 :param nfft: FFT size
 :param samplerate: 采样率
 :param lowfreq: Mel-filter的最低频带边缘
 :param highfreq: Mel-filter的最高频带边缘,默认samplerate/2
 """
 highfreq = highfreq or samplerate / 2

    # 按梅尔均匀间隔计算 点
    lowmel = hz2mel(lowfreq)
 highmel = hz2mel(highfreq)
 melpoints = np.linspace(lowmel, highmel, nfilt + 2)
 hz\_points = mel2hz(melpoints)  # 将mel频率再转到hz频率
    # bin = samplerate/2 / NFFT/2=sample\_rate/NFFT # 每个频点的频率数
    # bins = hz\_points/bin=hz\_points*NFFT/ sample\_rate # hz\_points对应第几个fft频点
    bin = np.floor((nfft + 1) * hz\_points / samplerate)

 fbank = np.zeros([nfilt, int(nfft / 2 + 1)])  # (m,f)
    for i in range(0, nfilt):
 for j in range(int(bin[i]), int(bin[i + 1])):
 fbank[i, j] = (j - bin[i]) / (bin[i + 1] - bin[i])
 for j in range(int(bin[i + 1]), int(bin[i + 2])):
 fbank[i, j] = (bin[i + 2] - j) / (bin[i + 2] - bin[i + 1])

 # fbank -= (np.mean(fbank, axis=0) + 1e-8)
    return fbank

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VfmyAZLG-1653745623141)(https://img2022.cnblogs.com/blog/1433301/202205/1433301-20220525205649388-677582683.png)]

mel 滤波器组特征

cc7bcf034b228e104d501b942da62892 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

2a1b7a472c645d32820e7456041ddd51 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)fb6301a9bb2ab7f9b838a64d1600ad63 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

# -*- coding:utf-8 -*-
# Author:凌逆战 | Never
# Date: 2022/5/19
"""
1、提取Mel filterBank
2、提取mel spectrum
"""
import librosa
import numpy as np
import matplotlib.pyplot as plt
import librosa.display
from matplotlib.ticker import FuncFormatter
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode\_minus'] = False  # 用来正常显示符号

def hz2mel(hz):
 """ Hz to Mels """
    return 2595 * np.log10(1 + hz / 700.0)

def mel2hz(mel):
 """ Mels to HZ """
    return 700 * (10 ** (mel / 2595.0) - 1)

def mel\_filterbanks(nfilt=20, nfft=512, samplerate=16000, lowfreq=0, highfreq=None):
 """计算一个Mel-filterbank (M,F)
 :param nfilt: filterbank中的滤波器数量
 :param nfft: FFT size
 :param samplerate: 采样率
 :param lowfreq: Mel-filter的最低频带边缘
 :param highfreq: Mel-filter的最高频带边缘,默认samplerate/2
 """
 highfreq = highfreq or samplerate / 2

    # 按梅尔均匀间隔计算 点
    lowmel = hz2mel(lowfreq)
 highmel = hz2mel(highfreq)
 melpoints = np.linspace(lowmel, highmel, nfilt + 2)
 hz\_points = mel2hz(melpoints)  # 将mel频率再转到hz频率
    # bin = samplerate/2 / NFFT/2=sample\_rate/NFFT # 每个频点的频率数
    # bins = hz\_points/bin=hz\_points*NFFT/ sample\_rate # hz\_points对应第几个fft频点
    bin = np.floor((nfft + 1) * hz\_points / samplerate)

 fbank = np.zeros([nfilt, int(nfft / 2 + 1)])  # (m,f)
    for i in range(0, nfilt):
 for j in range(int(bin[i]), int(bin[i + 1])):
 fbank[i, j] = (j - bin[i]) / (bin[i + 1] - bin[i])
 for j in range(int(bin[i + 1]), int(bin[i + 2])):
 fbank[i, j] = (bin[i + 2] - j) / (bin[i + 2] - bin[i + 1])

 # fbank -= (np.mean(fbank, axis=0) + 1e-8)
    return fbank

wav\_path = "./p225\_001.wav"
fs = 16000
NFFT = 512
win\_length = 512
num\_filter = 22
low\_freq\_mel = 0
high\_freq\_mel = hz2mel(fs // 2)  # 求最高hz频率对应的mel频率
mel\_points = np.linspace(low\_freq\_mel, high\_freq\_mel, num\_filter + 2)  # 在mel频率上均分成42个点
hz\_points = mel2hz(mel\_points)  # 将mel频率再转到hz频率
print(hz\_points)

# bin = sample\_rate/2 / NFFT/2=sample\_rate/NFFT # 每个频点的频率数
# bins = hz\_points/bin=hz\_points*NFFT/ sample\_rate # hz\_points对应第几个fft频点
bins = np.floor((NFFT + 1) * hz\_points / fs)
print(bins)
# [ 0. 2. 5. 8. 12. 16. 20. 25. 31. 37. 44. 52. 61. 70.
# 81. 93. 107. 122. 138. 157. 178. 201. 227. 256.]

wav = librosa.load(wav\_path, sr=fs)[0]
S = librosa.stft(wav, n\_fft=NFFT, hop\_length=NFFT // 2, win\_length=win\_length, window="hann", center=False)
mag = np.abs(S)  # 幅度谱 (257, 127) librosa.magphase()

filterbanks = mel\_filterbanks(nfilt=num\_filter, nfft=NFFT, samplerate=fs, lowfreq=0, highfreq=fs // 2)

# ================ 画三角滤波器 ===========================
FFT\_len = NFFT // 2 + 1
fs\_bin = fs // 2 / (NFFT // 2)  # 一个频点多少Hz
x = np.linspace(0, FFT\_len, FFT\_len)

plt.plot(x * fs\_bin, filterbanks.T)

plt.xlim(0) # 坐标轴的范围
plt.ylim(0, 1)
plt.tight\_layout()
plt.grid(linestyle='--')
plt.show()

filter\_banks = np.dot(filterbanks, mag)  # (M,F)*(F,T)=(M,T)
filter\_banks = 20 * np.log10(filter\_banks)  # dB

# ================ 绘制语谱图 ==========================
# 绘制 频谱图 方法1
plt.imshow(filter\_banks, cmap="jet", aspect='auto')
ax = plt.gca()  # 获取其中某个坐标系
ax.invert\_yaxis()  # 将y轴反转
plt.tight\_layout()
plt.show()

# 绘制 频谱图 方法2
plt.figure()
librosa.display.specshow(filter\_banks, sr=fs, x\_axis='time', y\_axis='linear', cmap="jet")
plt.xlabel('时间/s', fontsize=14)
plt.ylabel('频率/kHz', fontsize=14)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
def formatnum(x, pos):
 return '$%d$' % (x / 1000)

formatter = FuncFormatter(formatnum)
plt.gca().yaxis.set\_major\_formatter(formatter)
plt.tight\_layout()
plt.show()

画图代码
另外Librosa写好了完整的提取mel频谱和MFCC的API:

mel\_spec = librosa.feature.melspectrogram(y=y, sr=sr, n\_mels=128, fmax=8000)
mfccs = librosa.feature.mfcc(y=y, sr=sr, n\_mfcc=40)

巴克刻度

  巴克刻度(Bark scale)是于1961年由德国声学家Eberhard Zwicker提出的一种心理声学的尺度。它以Heinrich Barkhausen的名字命名,他提出了响度的第一个主观测量。[1]该术语的一个定义是“……等距离对应于感知上等距离的频率刻度”。高于约 500 Hz 时,此刻度或多或少等于对数频率轴。低于 500 Hz 时,Bark 标度变为越来越线性”。bark 刻度的范围是从1到24,并且它们与听觉的临界频带相对应。

频率f (Hz) 转换为 Bark:

Bark =13arctan(0.00076f)+3.5arctan((f7500)2) Bark =13arctan⁡(0.00076f)+3.5arctan⁡((f7500)2)\text { Bark }=13 \arctan (0.00076 f)+3.5 \arctan ((\frac{f}{7500})^{2})
Traunmüller, 1990 提出的新的Bark scale公式:

Bark=26.81f1960+f−0.53Bark=26.81f1960+f−0.53\operatorname{Bark}=\frac{26.81f}{1960+f}-0.53
反转:f=1960((Bark+0.53)−1)26.81f=1960((Bark+0.53)−1)26.81f=\frac{1960((\operatorname{Bark}+0.53)-1)}{26.81}

临界带宽(Hz):Bc=52548Bark2−52.56Bark+690.39Bc=52548Bark2−52.56Bark+690.39B_c=\frac{52548}{\operatorname{Bark}^2-52.56\operatorname{Bark}+690.39}

Wang, Sekey & Gersho, 1992 提出了新的Bark scale公式:

Bark =6sinh−1(f600) Bark =6sinh−1⁡(f600)\text { Bark }=6 \sinh ^{-1}(\frac{f}{600})

def hz2bark\_1961(Hz):
 return 13.0 * np.arctan(0.00076 * Hz) + 3.5 * np.arctan((Hz / 7500.0) ** 2)

def hz2bark\_1990(Hz):
 bark\_scale = (26.81 * Hz) / (1960 + Hz) - 0.5
    return bark\_scale

def hz2bark\_1992(Hz):
 return 6 * np.arcsinh(Hz / 600)

3f0b3203f577c544a4fe5ff75545cb4f - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

8ff57bf603b9284fae9717aed7e32f8d - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)0ff4ad2c1598e19c309eaece85b7dfa3 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

def hz2bark\_1961(Hz):
 return 13.0 * np.arctan(0.00076 * Hz) + 3.5 * np.arctan((Hz / 7500.0) ** 2)

def hz2bark\_1990(Hz):
 bark\_scale = (26.81 * Hz) / (1960 + Hz) - 0.5
    return bark\_scale

def hz2bark\_1992(Hz):
 return 6 * np.arcsinh(Hz / 600)

if \_\_name\_\_ == "\_\_main\_\_":
 fs = 16000
 hz = np.linspace(0, fs // 2, fs // 2)
 bark\_1961 = hz2bark\_1961(hz)
 bark\_1990 = hz2bark\_1990(hz)
 bark\_1992 = hz2bark\_1992(hz)

 plt.plot(hz, bark\_1961, label="bark\_1961")
 plt.plot(hz, bark\_1990, label="bark\_1990")
 plt.plot(hz, bark\_1992, label="bark\_1992")
 plt.legend() # 显示图例
    plt.xlabel("Hertz scale (Hz)", fontsize=12)  # x轴的名字
    plt.ylabel("Bark scale", fontsize=12)

 plt.xticks(fontsize=10)  # x轴的刻度
    plt.yticks(fontsize=10)

 plt.xlim(0, fs // 2)  # 坐标轴的范围
 plt.ylim(0)

 def formatnum(x, pos):
 return '$%.1f$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 # plt.gca().xaxis.set\_major\_formatter(formatter)
    # plt.gca().yaxis.set\_major\_formatter(formatter)
    plt.grid(linestyle='--')
 plt.tight\_layout()
 plt.show()

画图代码

Bark 滤波器组

10494dfc441a0695c08c1155d01e1493 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

Bark频谱

bb7c79ca0b3f72634861258d28ec5984 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

4e22c35fac063702dd63d4f4bbc03dfa - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)b66da4141d81ad9fd4f48884e66aefb9 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode\_minus'] = False  # 用来正常显示符号

def hz2bark(f):
 """ Hz to bark频率 (Wang, Sekey & Gersho, 1992.) """
    return 6. * np.arcsinh(f / 600.)

def bark2hz(fb):
 """ Bark频率 to Hz """
    return 600. * np.sinh(fb / 6.)

def fft2hz(fft, fs=16000, nfft=512):
 """ FFT频点 to Hz """
    return (fft * fs) / (nfft + 1)

def hz2fft(fb, fs=16000, nfft=512):
 """ Bark频率 to FFT频点 """
    return (nfft + 1) * fb / fs

def fft2bark(fft, fs=16000, nfft=512):
 """ FFT频点 to Bark频率 """
    return hz2bark((fft * fs) / (nfft + 1))

def bark2fft(fb, fs=16000, nfft=512):
 """ Bark频率 to FFT频点 """
    # bin = sample\_rate/2 / nfft/2=sample\_rate/nfft # 每个频点的频率数
    # bins = hz\_points/bin=hz\_points*nfft/ sample\_rate # hz\_points对应第几个fft频点
    return (nfft + 1) * bark2hz(fb) / fs

def Fm(fb, fc):
 """ 计算一个特定的中心频率的Bark filter
 :param fb: frequency in Bark.
 :param fc: center frequency in Bark.
 :return: 相关的Bark filter 值/幅度
 """
    if fc - 2.5 <= fb <= fc - 0.5:
 return 10 ** (2.5 * (fb - fc + 0.5))
 elif fc - 0.5 < fb < fc + 0.5:
 return 1
    elif fc + 0.5 <= fb <= fc + 1.3:
 return 10 ** (-2.5 * (fb - fc - 0.5))
 else:
 return 0

def bark\_filter\_banks(nfilts=20, nfft=512, fs=16000, low\_freq=0, high\_freq=None, scale="constant"):
 """ 计算Bark-filterbanks,(B,F)
 :param nfilts: 滤波器组中滤波器的数量 (Default 20)
 :param nfft: FFT size.(Default is 512)
 :param fs: 采样率,(Default 16000 Hz)
 :param low\_freq: MEL滤波器的最低带边。(Default 0 Hz)
 :param high\_freq: MEL滤波器的最高带边。(Default samplerate/2)
 :param scale (str): 选择Max bins 幅度 "ascend"(上升),"descend"(下降)或 "constant"(恒定)(=1)。默认是"constant"
 :return:一个大小为(nfilts, nfft/2 + 1)的numpy数组,包含滤波器组。
 """
    # init freqs
    high\_freq = high\_freq or fs / 2
 low\_freq = low\_freq or 0

 # 按Bark scale 均匀间隔计算点数(点数以Bark为单位)
    low\_bark = hz2bark(low\_freq)
 high\_bark = hz2bark(high\_freq)
 bark\_points = np.linspace(low\_bark, high\_bark, nfilts + 4)

 bins = np.floor(bark2fft(bark\_points))  # Bark Scale等分布对应的 FFT bin number
    # [ 0. 2. 5. 7. 10. 13. 16. 20. 24. 28. 33. 38. 44. 51.
    # 59. 67. 77. 88. 101. 115. 132. 151. 172. 197. 224. 256.]
    fbank = np.zeros([nfilts, nfft // 2 + 1])

 # init scaler
    if scale == "descendant" or scale == "constant":
 c = 1
    else:
 c = 0

 for i in range(0, nfilts):      # --> B
        # compute scaler
        if scale == "descendant":
 c -= 1 / nfilts
 c = c * (c > 0) + 0 * (c < 0)
 elif scale == "ascendant":
 c += 1 / nfilts
 c = c * (c < 1) + 1 * (c > 1)

 for j in range(int(bins[i]), int(bins[i + 4])):     # --> F
            fc = bark\_points[i+2]   # 中心频率
            fb = fft2bark(j)        # Bark 频率
            fbank[i, j] = c * Fm(fb, fc)
 return np.abs(fbank)

if \_\_name\_\_ == "\_\_main\_\_":
 nfilts = 22
 NFFT = 512
 fs = 16000
 wav = librosa.load("p225\_001.wav",sr=fs)[0]
 S = librosa.stft(wav, n\_fft=NFFT, hop\_length=NFFT // 2, win\_length=NFFT, window="hann", center=False)
 mag = np.abs(S)  # 幅度谱 (257, 127) librosa.magphase()
    filterbanks = bark\_filter\_banks(nfilts=nfilts, nfft=NFFT, fs=fs, low\_freq=0, high\_freq=None, scale="constant")
 # ================ 画三角滤波器 ===========================
    FFT\_len = NFFT // 2 + 1
 fs\_bin = fs // 2 / (NFFT // 2)  # 一个频点多少Hz
    x = np.linspace(0, FFT\_len, FFT\_len)

 plt.plot(x * fs\_bin, filterbanks.T)

 # plt.xlim(0) # 坐标轴的范围
    # plt.ylim(0, 1)
 plt.tight\_layout()
 plt.grid(linestyle='--')
 plt.show()

 filter\_banks = np.dot(filterbanks, mag)  # (M,F)*(F,T)=(M,T)
    filter\_banks = 20 * np.log10(filter\_banks)  # dB
    # ================ 绘制语谱图 ==========================
 plt.figure()
 librosa.display.specshow(filter\_banks, sr=fs, x\_axis='time', y\_axis='linear', cmap="jet")
 plt.xlabel('时间/s', fontsize=14)
 plt.ylabel('频率/kHz', fontsize=14)
 plt.xticks(fontsize=14)
 plt.yticks(fontsize=14)
 def formatnum(x, pos):
 return '$%d$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 plt.gca().yaxis.set\_major\_formatter(formatter)
 plt.tight\_layout()
 plt.show()

代码

等效矩阵带宽

  等效矩形带宽(Equivalent Rectangular Bandwidth,ERB)是用于心理声学(研究人对声音(包括言语和音乐)的生理和心理反应的科学)的一种量度方法,它给出了一个近似于 人耳听觉的对带宽的过滤方法,使用不现实但方便的简化方法将滤波器建模为矩形带通滤波器或带阻滤波器。

  Moore 和 Glasberg在1983 年,对于中等的声强和年轻的听者,人的听觉滤波器的带宽可以通过以下的多项式方程式近似:

公式1:ERB(f)=6.23⋅f2+93.39⋅f+28.52公式1:ERB⁡(f)=6.23⋅f2+93.39⋅f+28.52公式1:\operatorname{ERB}(f)=6.23 \cdot f^{2}+93.39 \cdot f+28.52
其中fff为滤波器的中心频率(kHz),ERB(f)ERB(f)ERB(f)为滤波器的带宽(Hz)。这个近似值是基于一些出版的同时掩蔽(Simultaneous masking)实验的结果。这个近似对于从0.1到6.5 kHz的范围是有效的

  它们也在1990年发表了另一(线性)近似

公式2:ERB(f)=24.7⋅(4.37⋅f+1)公式2:ERB⁡(f)=24.7⋅(4.37⋅f+1)公式2:\operatorname{ERB}(f)=24.7 \cdot(4.37 \cdot f+1)
其中fff的单位是 kHz,ERB(f)ERB(f)ERB(f)的单位是 Hz。这个近似值适用于中等声级和0.1 到 10 kHz 之间的频率值。

def hz2erb\_1983(f):
 """ 中心频率f(Hz) f to ERB(Hz) """
 f = f / 1000.0
    return (6.23 * (f ** 2)) + (93.39 * f) + 28.52

def hz2erb\_1990(f):
 """ 中心频率f(Hz) f to ERB(Hz) """
 f = f / 1000.0
    return 24.7 * (4.37 * f + 1.0)

def hz2erb\_1998(f):
 """ 中心频率f(Hz) f to ERB(Hz)
 hz2erb\_1990 和 hz2erb\_1990\_2 的曲线几乎一模一样
 M. Slaney, Auditory Toolbox, Version 2, Technical Report No: 1998-010, Internal Research Corporation, 1998
 http://cobweb.ecn.purdue.edu/~malcolm/interval/1998-010/
 """
    return 24.7 + (f / 9.26449)

e301bb0e2869acbd2990cdb60b8327e9 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

  给定频率fff可以求得 等效矩形带宽的数量(Number of ERB, ERBs)。ERBs可以通过求解以下微分方程组来构建量表:

公式3:{ERBs(0)=0dfdERBs(f)=ERB(f)公式3:{ERBs⁡(0)=0dfdERBs⁡(f)=ERB⁡(f)公式3:\left{\begin{array}{l} \operatorname{ERBs}(0)=0 \ \frac{d f}{d \operatorname{ERBs}(f)}=\operatorname{ERB}(f) \end{array}\right.
  使用ERB(f)公式1可求得ERBs:

ERBs(f)=11.17⋅ln(f+0.312f+14.675)+43.0ERBs⁡(f)=11.17⋅ln⁡(f+0.312f+14.675)+43.0 \operatorname{ERBs}(f)=11.17 \cdot \ln \left(\frac{f+0.312}{f+14.675}\right)+43.0
其中fff以 kHz 为单位。

用于MATLAB的 VOICEBOX 语音处理工具箱将转换及其逆实现为:

ERBs(f)=11.17268⋅ln(1+46.06538⋅ff+14678.49)ERBs⁡(f)=11.17268⋅ln⁡(1+46.06538⋅ff+14678.49)\operatorname{ERBs}(f)=11.17268 \cdot \ln \left(1+\frac{46.06538 \cdot f}{f+14678.49}\right)
f=676170.447.06538−e0.08950404⋅ERBs(f)−14678.49f=676170.447.06538−e0.08950404⋅ERBs⁡(f)−14678.49f=\frac{676170.4}{47.06538-e^{0.08950404 \cdot \operatorname{ERBs}(f)}}-14678.49
其中fff以Hz为单位。

def erb\_matlab\_voicebox(f: float) -> float:
 cuotient = (46.06538 * f) / float(f + 14678.49)
 return 11.17268 * math.log(1 + cuotient)

def ierb\_matlab\_voicebox(erbf: float) -> float:
 denom = 47.06538 - math.exp(0.08950404 - erbf)
 return (676170.4 / float(denom)) - 14678.49

  使用ERB(f)公式2可得:

ERBS(f)=21.4⋅log10(1+4.37⋅f1000)ERBS⁡(f)=21.4⋅log10⁡(1+4.37⋅f1000)\operatorname{ERBS}(f)=21.4 \cdot \log _{10}(1+ \frac{4.37\cdot f}{1000})
其中fff以赫兹为单位。

线性滤波器组

使用ERB的线性滤波器组

50a141732823f0348acf0d981a8118cd - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)17bc011c263f3afa534aacaa62b10a7a - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

# -*- coding:utf-8 -*-
# Author:凌逆战 | Never.Ling
# Date: 2022/5/28
"""
基于Josh McDermott的Matlab滤波器组代码:
https://github.com/wil-j-wil/py\_bank
https://github.com/flavioeverardo/erb\_bands
"""
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode\_minus'] = False  # 用来正常显示符号

class EquivalentRectangularBandwidth():
 def \_\_init\_\_(self, nfreqs, sample\_rate, total\_erb\_bands, low\_freq, max\_freq):
 if low\_freq == None:
 low\_freq = 20
        if max\_freq == None:
 max\_freq = sample\_rate // 2
 freqs = np.linspace(0, max\_freq, nfreqs)  # 每个STFT频点对应多少Hz
        self.EarQ = 9.265  # \_ERB\_Q
        self.minBW = 24.7  # minBW
        # 在ERB刻度上建立均匀间隔
        erb\_low = self.freq2erb(low\_freq)  # 最低 截止频率
        erb\_high = self.freq2erb(max\_freq)  # 最高 截止频率
        # 在ERB频率上均分为(total\_erb\_bands + )2个 频带
        erb\_lims = np.linspace(erb\_low, erb\_high, total\_erb\_bands + 2)
 cutoffs = self.erb2freq(erb\_lims)  # 将 ERB频率再转到 hz频率, 在线性频率Hz上找到ERB截止频率对应的频率
        # self.nfreqs F
        # self.freqs # 每个STFT频点对应多少Hz
        self.filters = self.get\_bands(total\_erb\_bands, nfreqs, freqs, cutoffs)

 def freq2erb(self, frequency):
 """ [Hohmann2002] Equation 16"""
        return self.EarQ * np.log(1 + frequency / (self.minBW * self.EarQ))

 def erb2freq(self, erb):
 """ [Hohmann2002] Equation 17"""
        return (np.exp(erb / self.EarQ) - 1) * self.minBW * self.EarQ

 def get\_bands(self, total\_erb\_bands, nfreqs, freqs, cutoffs):
 """
 获取erb bands、索引、带宽和滤波器形状
 :param erb\_bands\_num: ERB 频带数
 :param nfreqs: 频点数 F
 :param freqs: 每个STFT频点对应多少Hz
 :param cutoffs: 中心频率 Hz
 :param erb\_points: ERB频带界限 列表
 :return:
 """
 cos\_filts = np.zeros([nfreqs, total\_erb\_bands])  # (F, ERB)
        for i in range(total\_erb\_bands):
 lower\_cutoff = cutoffs[i]  # 上限截止频率 Hz
            higher\_cutoff = cutoffs[i + 2]  # 下限截止频率 Hz, 相邻filters重叠50%

 lower\_index = np.min(np.where(freqs > lower\_cutoff))  # 下限截止频率对应的Hz索引 Hz。np.where 返回满足条件的索引
            higher\_index = np.max(np.where(freqs < higher\_cutoff))  # 上限截止频率对应的Hz索引
            avg = (self.freq2erb(lower\_cutoff) + self.freq2erb(higher\_cutoff)) / 2
 rnge = self.freq2erb(higher\_cutoff) - self.freq2erb(lower\_cutoff)
 cos\_filts[lower\_index:higher\_index + 1, i] = np.cos(
 (self.freq2erb(freqs[lower\_index:higher\_index + 1]) - avg) / rnge * np.pi)  # 减均值,除方差

        # 加入低通和高通,得到完美的重构
        filters = np.zeros([nfreqs, total\_erb\_bands + 2])  # (F, ERB)
        filters[:, 1:total\_erb\_bands + 1] = cos\_filts
 # 低通滤波器上升到第一个余cos filter的峰值
        higher\_index = np.max(np.where(freqs < cutoffs[1]))  # 上限截止频率对应的Hz索引
        filters[:higher\_index + 1, 0] = np.sqrt(1 - np.power(filters[:higher\_index + 1, 1], 2))
 # 高通滤波器下降到最后一个cos filter的峰值
        lower\_index = np.min(np.where(freqs > cutoffs[total\_erb\_bands]))
 filters[lower\_index:nfreqs, total\_erb\_bands + 1] = np.sqrt(
 1 - np.power(filters[lower\_index:nfreqs, total\_erb\_bands], 2))
 return cos\_filts

if \_\_name\_\_ == "\_\_main\_\_":
 fs = 16000
 NFFT = 512  # 信号长度
    ERB\_num = 20
 low\_lim = 20  # 最低滤波器中心频率
    high\_lim = fs / 2  # 最高滤波器中心频率

 freq\_num = NFFT // 2 + 1
 fs\_bin = fs // 2 / (NFFT // 2)  # 一个频点多少Hz
    x = np.linspace(0, freq\_num, freq\_num)

 # ================ 画三角滤波器 ===========================
    ERB = EquivalentRectangularBandwidth(freq\_num, fs, ERB\_num, low\_lim, high\_lim)
 filterbanks = ERB.filters.T  # (257, 20)
    plt.plot(x * fs\_bin, filterbanks.T)

 # plt.xlim(0) # 坐标轴的范围
    # plt.ylim(0, 1)
 plt.tight\_layout()
 plt.grid(linestyle='--')
 plt.show()

 # ================ 绘制语谱图 ==========================
    wav = librosa.load("p225\_001.wav", sr=fs)[0]
 S = librosa.stft(wav, n\_fft=NFFT, hop\_length=NFFT // 2, win\_length=NFFT, window="hann", center=False)
 mag = np.abs(S)  # 幅度谱 (257, 127) librosa.magphase()

 filter\_banks = np.dot(filterbanks, mag)  # (M,F)*(F,T)=(M,T)
    filter\_banks = 20 * np.log10(filter\_banks)  # dB

 plt.figure()
 librosa.display.specshow(filter\_banks, sr=fs, x\_axis='time', y\_axis='linear', cmap="jet")
 plt.xlabel('时间/s', fontsize=14)
 plt.ylabel('频率/kHz', fontsize=14)
 plt.xticks(fontsize=14)
 plt.yticks(fontsize=14)

 def formatnum(x, pos):
 return '$%d$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 plt.gca().yaxis.set\_major\_formatter(formatter)
 plt.tight\_layout()
 plt.show()

View Code

13cb991c22d92f8a326f2c6631b544cd - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)ce05786ab956601917e39a170266a7c2 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

Gammatone 滤波器组

  外界语音信号进入耳蜗的基底膜后,将依据频率进行分解并产生行波震动,从而刺激听觉感受细胞。GammaTone 滤波器是一组用来模拟耳蜗频率分解特点的滤波器模型,由脉冲响应描述的线性滤波器,脉冲响应是gamma 分布正弦(sin)音调的乘积。它是听觉系统中一种广泛使用的听觉滤波器模型。

历史

  一般认为外周听觉系统的频率分析方式可以通过一组带通滤波器来进行一定程度的模拟,人们为此也提出了各种各样的滤波器组,如 roex 滤波器(Patterson and Moore 1986)。

在神经科学上有一种叫做反向相关性 “reverse correlation”(de Boer and Kuyper 1968)的计算方式,通过计算初级听觉神经纤维对于白噪声刺激的响应以及相关程度,即听觉神经元发放动作电位前的平均叠加信号,从而直接从生理状态上估计听觉滤波器的形状。这个滤波器是在外周听觉神经发放动作电位前生效的,因此得名为“revcor function”,可以作为一定限度下对外周听觉滤波器冲激响应的估计,也就是耳蜗等对音频信号的前置带通滤波。

  1972年Johannesma提出了 gammatone 滤波器用来逼近recvor function:

时域表达式:g(t)=atn−1e−2πbtcos(2πfct+ϕ0)时域表达式:g(t)=atn−1e−2πbtcos⁡(2πfct+ϕ0)时域表达式:g(t)=a t^{n-1} e^{-2 \pi b t} \cos (2 \pi f_c t+\phi_0)
其中fc(Hz)fc(Hz)f_c(Hz)是中心频率(center frequency),ϕ0ϕ0\phi_0是初始相位(phase),aaa是幅度(amplitude),nnn是滤波器的阶数(order),越大则偏度越低,滤波器越“瘦高”,b(Hz)b(Hz)b(Hz)是滤波器的3dB 带宽(bandwidth),t(s)t(s)t(s)是时间。

b8bd6d391e1f11f6281913e9cb3143df - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

这个时域脉冲响应是一个正弦曲线(pure tone),其幅度包络是一个缩放的gamma分布函数。

我们可以通过时域表达式生成一组gammatone滤波器组gammatone滤波器组特征

66bdf92570e534e5d0715cfca3ef891b - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)84c349b13aecff671a551545291ebdff - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

# -*- coding:utf-8 -*-
# Author:凌逆战 | Never.Ling
# Date: 2022/5/24
"""
时域滤波器组 FFT 转频域滤波器组 与语音频谱相乘
参考:https://github.com/TAriasVergara/Acoustic\_features
"""
import librosa
import librosa.display
import numpy as np
from scipy.fftpack import dct
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode\_minus'] = False  # 用来正常显示符号

def erb\_space(low\_freq=50, high\_freq=8000, n=64):
 """ 计算中心频率(ERB scale)
 :param min\_freq: 中心频率域的最小频率。
 :param max\_freq: 中心频率域的最大频率。
 :param nfilts: 滤波器的个数,即等于计算中心频率的个数。
 :return: 一组中心频率
 """
 ear\_q = 9.26449
 min\_bw = 24.7
 cf\_array = -(ear\_q * min\_bw) + np.exp(
 np.linspace(1, n, n) * (-np.log(high\_freq + ear\_q * min\_bw) + np.log(low\_freq + ear\_q * min\_bw)) / n) \
 * (high\_freq + ear\_q * min\_bw)
 return cf\_array

def gammatone\_impulse\_response(samplerate\_hz, length\_in\_samples, center\_freq\_hz, p):
 """ gammatone滤波器的时域公式
 :param samplerate\_hz: 采样率
 :param length\_in\_samples: 信号长度
 :param center\_freq\_hz: 中心频率
 :param p: 滤波器阶数
 :return: gammatone 脉冲响应
 """
    # 生成一个gammatone filter (1990 Glasberg&Moore parametrized)
    erb = 24.7 + (center\_freq\_hz / 9.26449)  # equivalent rectangular bandwidth.
    # 中心频率
    an = (np.pi * np.math.factorial(2 * p - 2) * np.power(2, float(-(2 * p - 2)))) / np.square(np.math.factorial(p - 1))
 b = erb / an  # 带宽

 a = 1  # 幅度(amplitude). 这在后面的归一化过程中有所不同。
    t = np.linspace(1. / samplerate\_hz, length\_in\_samples / samplerate\_hz, length\_in\_samples)
 gammatone\_ir = a * np.power(t, p - 1) * np.exp(-2 * np.pi * b * t) * np.cos(2 * np.pi * center\_freq\_hz * t)
 return gammatone\_ir

def generate\_filterbank(fs, fmax, L, N, p=4):
 """
 L: 在样本中测量的信号的大小
 N: 滤波器数量
 p: Gammatone脉冲响应的阶数
 """
    # 中心频率
    if fs == 8000:
 fmax = 4000
 center\_freqs = erb\_space(50, fmax, N)  # 中心频率列表
    center\_freqs = np.flip(center\_freqs)  # 反转数组
    n\_center\_freqs = len(center\_freqs)  # 中心频率的数量

 filterbank = np.zeros((N, L))

 # 为每个中心频率生成 滤波器
    for i in range(n\_center\_freqs):
 # aa = gammatone\_impulse\_response(fs, L, center\_freqs[i], p)
        filterbank[i, :] = gammatone\_impulse\_response(fs, L, center\_freqs[i], p)
 return filterbank

def gfcc(cochleagram, numcep=13):
 feat = dct(cochleagram, type=2, axis=1, norm='ortho')[:, :numcep]
 # feat-= (np.mean(feat, axis=0) + 1e-8)#Cepstral mean substration
    return feat

def cochleagram(sig\_spec, filterbank, nfft):
 """
 :param sig\_spec: 语音频谱
 :param filterbank: 时域滤波器组
 :param nfft: fft\_size
 :return:
 """
 filterbank = powerspec(filterbank, nfft)  # 时域滤波器组经过FFT变换
    filterbank /= np.max(filterbank, axis=-1)[:, None]  # Normalize filters
    cochlea\_spec = np.dot(sig\_spec, filterbank.T)  # 矩阵相乘
    cochlea\_spec = np.where(cochlea\_spec == 0.0, np.finfo(float).eps, cochlea\_spec)  # 把0变成一个很小的数
    # cochlea\_spec= np.log(cochlea\_spec)-np.mean(np.log(cochlea\_spec),axis=0)
    cochlea\_spec = np.log(cochlea\_spec)
 return cochlea\_spec, filterbank

def powerspec(X, nfft):
 # Fourier transform
    # Y = np.fft.rfft(X, n=n\_padded)
    Y = np.fft.fft(X, n=nfft)
 Y = np.absolute(Y)

 # non-redundant part
    m = int(nfft / 2) + 1
 Y = Y[:, :m]

 return np.abs(Y) ** 2

if \_\_name\_\_ == "\_\_main\_\_":
 nfilts = 22
 NFFT = 512
 fs = 16000
 Order = 4

 FFT\_len = NFFT // 2 + 1
 fs\_bin = fs // 2 / (NFFT // 2)  # 一个频点多少Hz
    x = np.linspace(0, FFT\_len, FFT\_len)
 # ================ 画三角滤波器 ===========================
    # gammatone\_impulse\_response = gammatone\_impulse\_response(fs/2, 512, 200, Order) # gammatone冲击响应
    generate\_filterbank = generate\_filterbank(fs, fs // 2, FFT\_len, nfilts, Order)
 filterbanks = powerspec(generate\_filterbank, NFFT)  # 时域滤波器组经过FFT变换
    filterbanks /= np.max(filterbanks, axis=-1)[:, None]  # Normalize filters
    print(generate\_filterbank.shape)    # (22, 257)
    # plt.plot(filterbanks.T)
    plt.plot(x * fs\_bin, filterbanks.T)
 # plt.xlim(0) # 坐标轴的范围
    # plt.ylim(0, 1)
 plt.tight\_layout()
 plt.grid(linestyle='--')
 plt.show()

 # ================ 绘制语谱图 ==========================
    wav = librosa.load("p225\_001.wav", sr=fs)[0]
 S = librosa.stft(wav, n\_fft=NFFT, hop\_length=NFFT // 2, win\_length=NFFT, window="hann", center=False)
 mag = np.abs(S)  # 幅度谱 (257, 127) librosa.magphase()

 filter\_banks = np.dot(filterbanks, mag)  # (M,F)*(F,T)=(M,T)
    filter\_banks = 20 * np.log10(filter\_banks)  # dB

 plt.figure()
 librosa.display.specshow(filter\_banks, sr=fs, x\_axis='time', y\_axis='linear', cmap="jet")
 plt.xlabel('时间/s', fontsize=14)
 plt.ylabel('频率/kHz', fontsize=14)
 plt.xticks(fontsize=14)
 plt.yticks(fontsize=14)

 def formatnum(x, pos):
 return '$%d$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 plt.gca().yaxis.set\_major\_formatter(formatter)
 plt.tight\_layout()
 plt.show()

View Code
20d7e82b4594ec3e844c6d36e937fb41 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)9fdde0b6981d48a60ca8a5ac603e9eb5 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

  可以看到低频段分得很细,高频段分得很粗,和人耳听觉特性较为符合。

频域表达式:H(f)==a[R(f)⊗S(f)]a2(n−1)!(2πb)−n{eiϕ0[1+i(f−fc)b]−n+e−iϕ0[1+i(f+fc)b]−n}频域表达式:H(f)=a[R(f)⊗S(f)]=a2(n−1)!(2πb)−n{eiϕ0[1+i(f−fc)b]−n+e−iϕ0[1+i(f+fc)b]−n}频域表达式:\begin{aligned} H(f)=& a[R(f) \otimes S(f)] \ =& \frac{a}{2}(n-1) !(2 \pi b)^{-n}\left{e^{i \phi_0}\left[1+\frac{i(f-f_c)}{b} \right]^{-n}+e^{-i \phi_0}\left[1+\frac{i(f+f_c)}{b}\right]^{-n}\right} \end{aligned}
频率表达式中R(f)R(f)R(f)是 指数+阶跃函数的傅里叶变换,阶跃函数用来区别 t>0 和 t<0。S(f)S(f)S(f)是频率为f0f0f_0的余弦的傅里叶变换。可以看到是一个中心频率在fcfcf_c、 在两侧按照e指数衰减的滤波器。通过上述表达式可以生成一组滤波器,求Gammatone滤波器组特征 只需要将Gammatone滤波器组与语音幅度谱相乘即可得到Gammatone滤波器组特征。

6928a650fbc9a5b0d64d7dc09235dd54 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)c7ac46de85ef8eebb9513dd4b98ca817 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

# -*- coding:utf-8 -*-
# Author:凌逆战 | Never
# Date: 2022/5/24
"""
Gammatone-filter-banks implementation
based on https://github.com/mcusi/gammatonegram/
"""
import librosa
import librosa.display
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.ticker import FuncFormatter

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode\_minus'] = False  # 用来正常显示符号

# Slaney's ERB Filter constants
EarQ = 9.26449
minBW = 24.7

def generate\_center\_frequencies(min\_freq, max\_freq, filter\_nums):
 """ 计算中心频率(ERB scale)
 :param min\_freq: 中心频率域的最小频率。
 :param max\_freq: 中心频率域的最大频率。
 :param filter\_nums: 滤波器的个数,即等于计算中心频率的个数。
 :return: 一组中心频率
 """
    # init vars
    n = np.linspace(1, filter\_nums, filter\_nums)
 c = EarQ * minBW

 # 计算中心频率
    cfreqs = (max\_freq + c) * np.exp((n / filter\_nums) * np.log(
 (min\_freq + c) / (max\_freq + c))) - c

 return cfreqs

def compute\_gain(fcs, B, wT, T):
 """ 为了 阶数 计算增益和矩阵化计算
 :param fcs: 中心频率
 :param B: 滤波器的带宽
 :param wT: 对应于用于频域计算的 w * T = 2 * pi * freq * T
 :param T: 周期(单位秒s),1/fs
 :return:
 Gain: 表示filter gains 的2d numpy数组
 A: 用于最终计算的二维数组
 """
    # 为了简化 预先计算
    K = np.exp(B * T)
 Cos = np.cos(2 * fcs * np.pi * T)
 Sin = np.sin(2 * fcs * np.pi * T)
 Smax = np.sqrt(3 + 2 ** (3 / 2))
 Smin = np.sqrt(3 - 2 ** (3 / 2))

 # 定义A矩阵的行
    A11 = (Cos + Smax * Sin) / K
 A12 = (Cos - Smax * Sin) / K
 A13 = (Cos + Smin * Sin) / K
 A14 = (Cos - Smin * Sin) / K

 # 计算增益 (vectorized)
    A = np.array([A11, A12, A13, A14])
 Kj = np.exp(1j * wT)
 Kjmat = np.array([Kj, Kj, Kj, Kj]).T
 G = 2 * T * Kjmat * (A.T - Kjmat)
 Coe = -2 / K ** 2 - 2 * Kj ** 2 + 2 * (1 + Kj ** 2) / K
 Gain = np.abs(G[:, 0] * G[:, 1] * G[:, 2] * G[:, 3] * Coe ** -4)
 return A, Gain

def gammatone\_filter\_banks(nfilts=22, nfft=512, fs=16000, low\_freq=None, high\_freq=None, scale="contsant", order=4):
 """ 计算Gammatone-filterbanks, (G,F)
 :param nfilts: filterbank中滤波器的数量 (Default 22)
 :param nfft: FFT size (Default is 512)
 :param fs: 采样率 (Default 16000 Hz)
 :param low\_freq: 最低频带 (Default 0 Hz)
 :param high\_freq: 最高频带 (Default samplerate/2)
 :param scale: 选择Max bins 幅度 "ascend"(上升),"descend"(下降)或 "constant"(恒定)(=1)。默认是"constant"
 :param order: 滤波器阶数
 :return: 一个大小为(nfilts, nfft/2 + 1)的numpy数组,包含滤波器组。
 """
    # init freqs
    high\_freq = high\_freq or fs / 2
 low\_freq = low\_freq or 0

 # define custom difference func
    def Dif(u, a):
 return u - a.reshape(nfilts, 1)

 # init vars
    fbank = np.zeros([nfilts, nfft])
 width = 1.0
 maxlen = nfft // 2 + 1
 T = 1 / fs
 n = 4
 u = np.exp(1j * 2 * np.pi * np.array(range(nfft // 2 + 1)) / nfft)
 idx = range(nfft // 2 + 1)

 fcs = generate\_center\_frequencies(low\_freq, high\_freq, nfilts)  # 计算中心频率,转换到ERB scale
    ERB = width * ((fcs / EarQ) ** order + minBW ** order) ** (1 / order)  # 计算带宽
    B = 1.019 * 2 * np.pi * ERB

 # compute input vars
    wT = 2 * fcs * np.pi * T
 pole = np.exp(1j * wT) / np.exp(B * T)

 # compute gain and A matrix
    A, Gain = compute\_gain(fcs, B, wT, T)

 # compute fbank
    fbank[:, idx] = (
 (T ** 4 / Gain.reshape(nfilts, 1)) *
 np.abs(Dif(u, A[0]) * Dif(u, A[1]) * Dif(u, A[2]) * Dif(u, A[3])) *
 np.abs(Dif(u, pole) * Dif(u, pole.conj())) ** (-n))

 # 确保所有filters的最大值为1.0
    try:
 fbs = np.array([f / np.max(f) for f in fbank[:, range(maxlen)]])
 except BaseException:
 fbs = fbank[:, idx]

 # compute scaler
    if scale == "ascendant":
 c = [
 0,
 ]
 for i in range(1, nfilts):
 x = c[i - 1] + 1 / nfilts
 c.append(x * (x < 1) + 1 * (x > 1))
 elif scale == "descendant":
 c = [
 1,
 ]
 for i in range(1, nfilts):
 x = c[i - 1] - 1 / nfilts
 c.append(x * (x > 0) + 0 * (x < 0))
 else:
 c = [1 for i in range(nfilts)]

 # apply scaler
    c = np.array(c).reshape(nfilts, 1)
 fbs = c * np.abs(fbs)
 return fbs

if \_\_name\_\_ == "\_\_main\_\_":
 nfilts = 22
 NFFT = 512
 fs = 16000

 FFT\_len = NFFT // 2 + 1
 fs\_bin = fs // 2 / (NFFT // 2)  # 一个频点多少Hz
    x = np.linspace(0, FFT\_len, FFT\_len)

 # ================ 画三角滤波器 ===========================
    filterbanks = gammatone\_filter\_banks(nfilts=22, nfft=512, fs=16000,
 low\_freq=None, high\_freq=None,
 scale="contsant", order=4)
 print(filterbanks.shape)    # (22, 257)
    plt.plot(x * fs\_bin, filterbanks.T)

 # plt.xlim(0) # 坐标轴的范围
    # plt.ylim(0, 1)
 plt.tight\_layout()
 plt.grid(linestyle='--')
 plt.show()

 # ================ 绘制语谱图 ==========================
    wav = librosa.load("p225\_001.wav", sr=fs)[0]
 S = librosa.stft(wav, n\_fft=NFFT, hop\_length=NFFT // 2, win\_length=NFFT, window="hann", center=False)
 mag = np.abs(S)  # 幅度谱 (257, 127) librosa.magphase()

 filter\_banks = np.dot(filterbanks, mag)  # (M,F)*(F,T)=(M,T)
    filter\_banks = 20 * np.log10(filter\_banks)  # dB

 plt.figure()
 librosa.display.specshow(filter\_banks, sr=fs, x\_axis='time', y\_axis='linear', cmap="jet")
 plt.xlabel('时间/s', fontsize=14)
 plt.ylabel('频率/kHz', fontsize=14)
 plt.xticks(fontsize=14)
 plt.yticks(fontsize=14)

 def formatnum(x, pos):
 return '$%d$' % (x / 1000)

 formatter = FuncFormatter(formatnum)
 plt.gca().yaxis.set\_major\_formatter(formatter)
 plt.tight\_layout()
 plt.show()

View Code

75dc57ce33fb0ca370dc29b32838b687 - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)6ff9c44ec7be7d1c78fcd3ad7464149e - 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

  1988年Holdsworth 等人进一步阐明了GTF的各种特性,而且提供了一个数字IIR滤波器设计方案。这个技术使得GTF能够比FIR更加容易且高效地实现,为后续出现一些重要的实际应用做了铺垫。听觉滤波的gammatone模型的变化和改进包括复数gammatone滤波器、gammachirp滤波器、全极点(all-pole)和一零(one-zero) gammatone滤波器、双边(two-sided)gammatone滤波器和滤波器级联(filter-cascade)模型,以及各种level相关和这些的动态非线性版本。

参考

【博客】Auditory scales of frequency representation

【百度百科】心理声学

【维基百科】Bark scale

【维基百科】Mel scale

【维基百科】Equivalent rectangular bandwidth

【维基百科】Gammatone filter(包含了C \ C++ \ mathematica \ matlab的代码实现)

【博客】Equivalent Rectangular Bandwidth

【CSDN】GammaTone 滤波器详解

【python库】PyFilterbank

【代码】Brookes, Mike (22 December 2012). "frq2erb". VOICEBOX: Speech Processing Toolbox for MATLAB. Department of Electrical & Electronic Engineering, Imperial College, UK. Retrieved 20 January 2013.

【代码】Brookes, Mike (22 December 2012). "erb2frq". VOICEBOX: Speech Processing Toolbox for MATLAB. Department of Electrical & Electronic Engineering, Imperial College, UK. Retrieved 20 January 2013.

【论文】Smith, Julius O.; Abel, Jonathan S. (10 May 2007). "Equivalent Rectangular Bandwidth". Bark and ERB Bilinear Transforms. Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, USA. Retrieved 20 January 2013.

作者:凌逆战欢迎任何形式的转载,但请务必注明出处。限于本人水平,如果文章和代码有表述不当之处,还请不吝赐教。本文章不做任何商业用途,仅作为自学所用,文章后面会有参考链接,我可能会复制原作者的话,如果介意,我会修改或者删除。

转载请注明:xuhss » 声学感知刻度(mel scale、Bark scale、ERB)与声学特征提取(MFCC、BFCC、GFCC)

喜欢 (0)

您必须 登录 才能发表评论!