音频文件转语谱图和MFCC并进行FPN融合的Python代码

将音频文件转换为语谱图和MFCC并进行FPN融合的Python代码

该代码使用Python将数据集中的音频文件转换为语谱图和MFCC，并利用FPN融合技术生成融合后的图像。代码支持多层子目录，并提供详细步骤和示例代码。

import os
import librosa
import librosa.display
import numpy as np
from PIL import Image

# 设置数据集路径
data_path = 'D:/论文代码/data'

# 设置转换后的图像保存路径
save_path = 'D:/论文代码/converted_data'
if not os.path.exists(save_path):
    os.mkdir(save_path)

# 设置参数
n_mels = 128  # 设置MEL频率数量
n_fft = 2048  # 设置FFT窗口大小
hop_length = 512  # 设置帧移大小
n_mfcc = 20  # 设置MFCC数量

# 定义函数，将音频文件转换为语谱图
def audio_to_mel(audio_path):
    audio, sr = librosa.load(audio_path, sr=None)
    mel_spec = librosa.feature.melspectrogram(audio, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)
    mel_spec = librosa.power_to_db(mel_spec, ref=np.max)
    return mel_spec

# 定义函数，将语谱图转换为MFCC
def mel_to_mfcc(mel_spec):
    mfcc = librosa.feature.mfcc(S=mel_spec, n_mfcc=n_mfcc)
    return mfcc

# 遍历数据集，将音频文件转换为图像
for root, dirs, files in os.walk(data_path):
    for file in files:
        if file.endswith('.wav'):
            audio_path = os.path.join(root, file)
            # 转换为语谱图
            mel_spec = audio_to_mel(audio_path)
            # 保存语谱图
            mel_img_path = os.path.join(save_path, file[:-4] + '_mel.png')
            mel_img = Image.fromarray(mel_spec)
            mel_img.save(mel_img_path)
            # 转换为MFCC
            mfcc = mel_to_mfcc(mel_spec)
            # 保存MFCC
            mfcc_img_path = os.path.join(save_path, file[:-4] + '_mfcc.png')
            mfcc_img = Image.fromarray(mfcc)
            mfcc_img.save(mfcc_img_path)

# 定义函数，将图像进行FPN融合
def fpn_fusion(img_list):
    img_list = [np.array(Image.open(img_path)) for img_path in img_list]
    n_layers = len(img_list)
    fused_img = img_list[0].astype(np.float32)
    for i in range(1, n_layers):
        h, w = img_list[i].shape[:2]
        scaled_img = np.array(Image.fromarray(img_list[i]).resize((w//2, h//2)))
        scaled_img = scaled_img.astype(np.float32)
        fused_img = cv2.resize(fused_img, (w//2, h//2), interpolation=cv2.INTER_CUBIC)
        fused_img = fused_img + scaled_img
    fused_img = fused_img / n_layers
    fused_img = fused_img.astype(np.uint8)
    return fused_img

# 遍历数据集，进行FPN融合
for root, dirs, files in os.walk(save_path):
    for file in files:
        if file.endswith('_mel.png'):
            mel_img_path = os.path.join(root, file)
            mfcc_img_path = os.path.join(root, file[:-8] + 'mfcc.png')
            # 进行FPN融合
            fused_img = fpn_fusion([mel_img_path, mfcc_img_path])
            # 保存融合后的图像
            fused_img_path = os.path.join(root, file[:-8] + 'fused.png')
            fused_img = Image.fromarray(fused_img)
            fused_img.save(fused_img_path)

代码流程：

定义参数： 设置MEL频率数量、FFT窗口大小、帧移大小和MFCC数量等参数。
定义函数：
- audio_to_mel：将音频文件转换为语谱图。
- mel_to_mfcc：将语谱图转换为MFCC。
- fpn_fusion：将图像进行FPN融合。
遍历数据集：
- 使用os.walk遍历数据集中的音频文件，并将它们转换为语谱图和MFCC，并保存到指定的路径。
- 使用os.walk遍历保存图像的路径，将语谱图和MFCC的图像进行FPN融合，并保存融合后的图像。

注意：

该代码使用librosa库进行音频处理，需要安装该库。
代码中的cv2.resize函数需要使用OpenCV库。
数据集路径和保存路径需要根据实际情况进行修改。
FPN融合的具体实现方法可以根据需求进行调整。

示例：

假设数据集路径为'D:/论文代码/data'，包含多个子目录，每个子目录下包含多个音频文件。执行该代码后，会在'D:/论文代码/converted_data'路径下生成对应的语谱图、MFCC和融合后的图像。