音频事件检测模型:基于小样本匹配模板的方法
该代码是一个用于音频事件检测的模型,基于小样本匹配模板的方法。下面是代码的功能分析:
1.列出文件夹中所有的子文件夹及其内容(音频和注释):
folders = os.listdir(folder_path)
2.写入CSV文件的标题:
to_write = [['Audiofilename', 'Starttime', 'Endtime']]
3.循环遍历每个子文件夹:
for folder in folders:
4.列出每个子文件夹中的所有文件:
files = os.listdir(folder_path+folder)
5.对于每个wav格式的音频文件,进行STFT(短时傅里叶变换):
if file[-4:] == '.wav':
audio = file
annotation = file[:-4]+'.csv'
waveform, sr = librosa.load(folder_path+folder+'/'+audio, sr = None)
nfft=int(sr/10)
hop_len = int(nfft/4)
stft = np.abs(librosa.stft(waveform, n_fft=nfft, hop_length=hop_len, window='hann', pad_mode='reflect'))
6.进行噪声去除处理:
stft_median = np.median(stft, axis=-1, keepdims=True)
stft_time_median = np.median(stft, axis=0, keepdims=True)
norm_stft = stft - stft_median
norm_stft = norm_stft - stft_time_median
7.将一定数量的正事件('POS')作为模板,用于后续的匹配:
events = []
with open(folder_path+folder+'/'+annotation) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
if row[-1] == 'POS' and len(events) < shots:
events.append(row)
8.选择用于预测的STFT区域,并对每个事件进行匹配:
to_predict = norm_stft[:, int(np.ceil(float(events[-1][-2])*sr/hop_len + 1)):]
result = []
for event in events:
starttime = float(event[1])
endtime = float(event[2])
event_stft = norm_stft[2:-2,int(np.floor(starttime*sr/hop_len + 1)):int(np.ceil(endtime*sr/hop_len + 1))]
result.append(match_template(to_predict, event_stft))
9.根据模板的相似度设置预测阈值:
mr = []
for i in range(len(events)):
event = events[i]
starttime = float(event[1])
endtime = float(event[2])
event_stft = norm_stft[2:-2,int(np.floor(starttime*sr/hop_len + 1)):int(np.ceil(endtime*sr/hop_len + 1))]
r=[]
for j in range(len(events)):
if j != i:
inner_event = events[j]
inner_starttime = float(inner_event[1])
inner_endtime = float(inner_event[2])
inner_event_stft = norm_stft[2:-2,int(np.floor(inner_starttime*sr/hop_len + 1)):int(np.ceil(inner_endtime*sr/hop_len + 1))]
if inner_event_stft.shape[1] >= event_stft.shape[1]:
r.append(np.max(match_template(inner_event_stft, event_stft)))
if r:
mr.append(np.max(r))
threshold = np.max(mr)
10.进行峰值检测,将峰值处置为1,其余为0:
binary_result = []
for i in range(len(result)):
event_len = int(np.ceil(np.floor(float(events[i][2])*sr/hop_len + 1))) - int(np.floor(float(events[i][1])*sr/hop_len))
rmax = np.zeros((result[i].shape[1], ))
peaks, _ = find_peaks(np.max(result[i], axis=0), height=threshold, distance=event_len)
rmax[peaks] = 1
binary_result.append(rmax)
11.对二进制预测结果进行分段并重叠:
for i in range(len(binary_result)):
starttime = float(events[i][1])
endtime = float(events[i][2])
event_stft = norm_stft[2:-2,int(np.floor(starttime*sr/hop_len + 1)):int(np.ceil(endtime*sr/hop_len + 1))]
event_len = int(event_stft.shape[1])
lpad = int(np.floor(event_len/2))
rpad = int(event_len-lpad-1)
indeces = np.where(binary_result[i]==1)[0]
for index in indeces:
binary_result[i][int(index)-lpad:int(index)+rpad] = 1
binary_result[i] = np.pad(binary_result[i], (lpad, rpad))
binary_result[i] = np.pad(binary_result[i], (norm_stft.shape[1]-to_predict.shape[1],0))
binary_result[0] += binary_result[i]
final_result = binary_result[0]
final_result[np.where(final_result>0)] = 1
12.将时间帧转换为秒,并将预测结果写入CSV文件:
startind = np.where(final_result[:-1] - final_result[1:] == -1)[0]
endind = np.where(final_result[:-1] - final_result[1:] == 1)[0]
for i in range(len(startind)):
to_write.append([audio, startind[i]*hop_len/sr, endind[i]*hop_len/sr])
with open(output_file+'.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(to_write)
13.运行程序:
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-folder_path', type=str, help='path to Validation_Set folder')
args = parser.parse_args()
fewshot_match_template(folder_path=args.folder_path)
代码中还包含了一些参数设置,例如STFT的窗口大小、跳跃长度等,这些参数可以根据具体应用场景进行调整。
原文地址: https://www.cveoy.top/t/topic/nMSk 著作权归作者所有。请勿转载和采集!