Python音频信号恢复库:从频谱图重建音频
import numpy as np
import numpy
import decimal
def recover_wav(pd_abs_x, gt_x, n_overlap, winfunc, wav_len=None):
'Recover wave from spectrogram.
If you are using scipy.signal.spectrogram, you may need to multipy a scaler
to the recovered audio after using this function. For example,
recover_scaler = np.sqrt((ham_win**2).sum())
Args:
pd_abs_x: 2d array, (n_time, n_freq)
gt_x: 2d complex array, (n_time, n_freq)
n_overlap: integar.
winfunc: func, the analysis window to apply to each frame.
wav_len: integer. Pad or trunc to wav_len with zero.
Returns:
1d array.
'
x = real_to_complex(pd_abs_x, gt_x)
x = half_to_whole(x)
frames = ifft_to_wav(x)
(n_frames, n_window) = frames.shape
s = deframesig(frames=frames, siglen=0, frame_len=n_window,
frame_step=n_window-n_overlap, winfunc=winfunc)
if wav_len:
s = pad_or_trunc(s, wav_len)
return s
def real_to_complex(pd_abs_x, gt_x):
'Recover pred spectrogram's phase from ground truth's phase.
Args:
pd_abs_x: 2d array, (n_time, n_freq)
gt_x: 2d complex array, (n_time, n_freq)
Returns:
2d complex array, (n_time, n_freq)
'
theta = np.angle(gt_x)
cmplx = pd_abs_x * np.exp(1j * theta)
return cmplx
def half_to_whole(x):
'Recover whole spectrogram from half spectrogram.
'
return np.concatenate((x, np.fliplr(np.conj(x[:, 1:-1]))), axis=1)
def ifft_to_wav(x):
'Recover wav from whole spectrogram'
return np.real(np.fft.ifft(x))
def pad_or_trunc(s, wav_len):
if len(s) >= wav_len:
s = s[0 : wav_len]
else:
s = np.concatenate((s, np.zeros(wav_len - len(s))))
return s
def recover_gt_wav(x, n_overlap, winfunc, wav_len=None):
'Recover ground truth wav.
'
x = half_to_whole(x)
frames = ifft_to_wav(x)
(n_frames, n_window) = frames.shape
s = deframesig(frames=frames, siglen=0, frame_len=n_window,
frame_step=n_window-n_overlap, winfunc=winfunc)
if wav_len:
s = pad_or_trunc(s, wav_len)
return s
def deframesig(frames,siglen,frame_len,frame_step,winfunc=lambda x:numpy.ones((x,))):
'Does overlap-add procedure to undo the action of framesig.
Ref: From https://github.com/jameslyons/python_speech_features
:param frames: the array of frames.
:param siglen: the length of the desired signal, use 0 if unknown. Output will be truncated to siglen samples.
:param frame_len: length of each frame measured in samples.
:param frame_step: number of samples after the start of the previous frame that the next frame should begin.
:param winfunc: the analysis window to apply to each frame. By default no window is applied.
:returns: a 1-D signal.
'
frame_len = round_half_up(frame_len)
frame_step = round_half_up(frame_step)
numframes = numpy.shape(frames)[0]
assert numpy.shape(frames)[1] == frame_len, ''frames' matrix is wrong size, 2nd dim is not equal to frame_len'
indices = numpy.tile(numpy.arange(0,frame_len),(numframes,1)) + numpy.tile(numpy.arange(0,numframes*frame_step,frame_step),(frame_len,1)).T
indices = numpy.array(indices,dtype=numpy.int32)
padlen = (numframes-1)*frame_step + frame_len
if siglen <= 0: siglen = padlen
rec_signal = numpy.zeros((padlen,))
window_correction = numpy.zeros((padlen,))
win = winfunc(frame_len)
for i in range(0,numframes):
window_correction[indices[i,:]] = window_correction[indices[i,:]] + win + 1e-15 #add a little bit so it is never zero
rec_signal[indices[i,:]] = rec_signal[indices[i,:]] + frames[i,:]
rec_signal = rec_signal/window_correction
return rec_signal[0:siglen]
def round_half_up(number):
return int(decimal.Decimal(number).quantize(decimal.Decimal('1'), rounding=decimal.ROUND_HALF_UP))
代码详细解释内容:本代码实现了从频谱图中恢复原始音频信号的功能。具体来说,给定预测的频谱图和真实的频谱图,以及其他一些参数,比如重叠长度、窗函数等,可以通过本代码实现从频谱图中恢复出音频信号。
具体实现过程包括以下几个函数:
1. recover_wav(pd_abs_x, gt_x, n_overlap, winfunc, wav_len=None)
该函数接受四个参数:预测的频谱图pd_abs_x、真实的频谱图gt_x、重叠长度n_overlap、窗函数winfunc,以及可选参数wav_len。该函数首先将预测的频谱图的相位信息恢复为真实的相位信息,然后将半频谱图转换为全频谱图,接着将全频谱图进行逆傅里叶变换,得到音频的帧序列,最后使用重叠加法将音频帧序列合并成一个完整的音频信号,并根据可选参数wav_len对音频信号进行截断或填充。
2. real_to_complex(pd_abs_x, gt_x)
该函数接受两个参数:预测的频谱图pd_abs_x和真实的频谱图gt_x,返回一个复数形式的频谱图,其中预测的频谱图的相位信息被替换为真实的相位信息。
3. half_to_whole(x)
该函数接受一个半频谱图x,返回一个全频谱图,即将半频谱图的右半部分(除去第一列和最后一列)通过共轭反转对称到左侧。
4. ifft_to_wav(x)
该函数接受一个全频谱图x,返回一个音频帧序列,即将全频谱图进行逆傅里叶变换。
5. pad_or_trunc(s, wav_len)
该函数接受一个音频信号s和可选参数wav_len,根据wav_len对音频信号进行截断或填充。
6. recover_gt_wav(x, n_overlap, winfunc, wav_len=None)
该函数与recover_wav函数类似,但不需要提供预测的频谱图pd_abs_x,而是直接使用真实的频谱图gt_x进行恢复。
7. deframesig(frames,siglen,frame_len,frame_step,winfunc=lambda x:numpy.ones((x,)))
该函数实现了重叠加法,将音频帧序列合并成一个完整的音频信号。该函数接受五个参数:音频帧序列frames、期望的音频信号长度siglen、帧长度frame_len、帧间隔frame_step、以及窗函数winfunc。该函数首先将各个帧按照重叠加法的方式合并,然后通过窗函数对合并后的结果进行加权平均,最后根据siglen对结果进行截断。
8. round_half_up(number)
该函数用于将一个浮点数四舍五入为整数。
原文地址: https://www.cveoy.top/t/topic/ntM7 著作权归作者所有。请勿转载和采集!