根据 MFCC 系数计算流程编写 MFCC 计算函数的代码并用来计算一段语音的 MFCC 系数。 函数定义如下: ccc=NmfccxfspframeSizeinc; 其中x 是输入语音序列Mel 滤波器的个数为 p采样频率为 fsframeSize 为帧长 和 FFT 点数inc 为帧移;ccc 为 MFCC 参数。
以下是一个基于Python语言的MFCC计算函数的示例代码:
import numpy as np
from scipy.fftpack import dct
def preemphasis(signal, coeff=0.95):
"""
Pre-emphasis filter applied to the signal.
"""
return np.append(signal[0], signal[1:] - coeff * signal[:-1])
def framesig(signal, framesize, frameinc):
"""
Split the signal into overlapping frames with a certain size and shift.
"""
nframes = int(np.ceil(float(len(signal) - framesize + frameinc) / frameinc))
frames = np.zeros((nframes, framesize))
for i in range(nframes):
frames[i] = signal[i * frameinc:i * frameinc + framesize]
return frames
def melfb(p, n, fs):
"""
Create a Mel filterbank matrix.
"""
f0 = 700.0 / fs
fn2 = int(n / 2)
lr = np.log(1 + 0.5 / f0) / (p + 1)
CF = fs * f0 * (np.exp(np.arange(1, p + 1) * lr) - 1)
bl = np.floor((n + 1) * CF / fs)
bh = np.ceil((n + 1) * CF / fs)
b1 = np.array([0] * int(bl[0]) + list(range(int(bl[0]), int(bh[0]) + 1)) + [int(bh[0])] * (fn2 - int(bh[0])))
for i in range(1, p):
b1 = np.vstack((b1, np.array([0] * int(bl[i]) + list(range(int(bl[i]), int(bh[i]) + 1)) + [int(bh[i])] * (fn2 - int(bh[i])))))
H = np.zeros((p, int(fn2)))
for i in range(p):
for j in range(int(bl[i]), int(bh[i]) + 1):
H[i][j] = (j - bl[i]) / (bh[i] - bl[i])
for j in range(int(bh[i]), int(bl[i + 1]) + 1):
H[i][j] = (bl[i + 1] - j) / (bl[i + 1] - bh[i])
return H
def mfcc(signal, fs, p=20, framesize=0.025, frameinc=0.01, cep_lifter=22):
"""
Calculate the Mel-frequency cepstral coefficients (MFCCs) from a signal.
"""
signal = preemphasis(signal)
frames = framesig(signal, framesize * fs, frameinc * fs)
nfft = int(2 ** np.ceil(np.log(framesize * fs) / np.log(2)))
H = melfb(p, nfft, fs)
mspec = np.dot(frames, H.T)
mspec[mspec == 0] = np.finfo(float).eps
mspec = 20 * np.log10(mspec)
ceps = dct(mspec, type=2, axis=1, norm='ortho')[:, :cep_lifter]
ceps = lifter(ceps, cep_lifter)
return ceps
def lifter(ceps, L=22):
"""
Apply a cepstral lifter to the matrix of cepstra.
"""
nframes, ncoeff = np.shape(ceps)
n = np.arange(ncoeff)
lift = 1 + (L / 2) * np.sin(np.pi * n / L)
return lift * ceps
该函数中使用了一些辅助函数,包括 preemphasis、framesig、melfb 和 lifter。在主函数中调用 mfcc 函数即可计算输入语音序列的 MFCC 系数,例如:
import scipy.io.wavfile as wav
fs, x = wav.read("example.wav")
mfccs = mfcc(x, fs)
其中 "example.wav" 是输入语音的文件名,可以根据实际情况进行修改。输出的 mfccs 是一个二维数组,每一行代表一个帧的 MFCC 系数。
原文地址: https://www.cveoy.top/t/topic/b99d 著作权归作者所有。请勿转载和采集!