Python 使用 UTKFace 数据集建立 CNN 网络进行年龄预测，包含数据处理和准确率曲线绘制

数据处理

首先，我们需要下载并解压 UTKFace 数据集，该数据集包含了人脸图像以及对应的年龄标签，可以从以下链接下载：

https://drive.google.com/file/d/0BxYys69jI14kYVM3aVhKS1VhRUk/view

解压后，我们可以得到一个包含了 23708 张图片的文件夹，其中每张图片的文件名格式为 'age_gender_race_date_time.jpg'，其中 age 表示年龄，gender 表示性别，race 表示种族，date_time 表示日期和时间，我们只需要使用 age 这个标签。

下面是数据处理的代码：

import os
import cv2
import numpy as np

def load_data(data_dir):
    images = []
    labels = []
    for filename in os.listdir(data_dir):
        age = int(filename.split('_')[0])
        if age < 18 or age > 70:  # 只使用年龄在 18 到 70 岁之间的数据
            continue
        img = cv2.imread(os.path.join(data_dir, filename))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (224, 224))
        images.append(img)
        labels.append(age - 18)  # 将年龄转换为从 0 开始的数字
    images = np.array(images)
    labels = np.array(labels)
    return images, labels

data_dir = '/path/to/UTKFace'
images, labels = load_data(data_dir)

在上面的代码中，我们首先定义了一个 load_data 函数，用于加载数据。对于每张图片，我们首先从文件名中解析出年龄，然后判断年龄是否处于我们需要的范围内（18 到 70 岁之间），如果不在范围内，则跳过该图片。接着，我们使用 OpenCV 加载并处理图片，将其缩放为 224x224 的大小，并将像素值的颜色通道顺序从 BGR 转换为 RGB。最后，我们将图片和对应的标签存储在 images 和 labels 两个数组中，并返回这两个数组。

建立 CNN 网络

接下来，我们需要建立一个 CNN 网络，用于对人脸图像进行年龄预测。在本例中，我们将使用 Keras 框架来实现 CNN 网络。下面是 CNN 网络的代码：

from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(53, activation='softmax'))
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

model = build_model()

在上面的代码中，我们首先导入了 Keras 框架，并定义了一个 build_model 函数，用于建立 CNN 网络。该网络包括了 4 个卷积层和 2 个全连接层。每个卷积层都包括了一个 3x3 的卷积核和一个 ReLU 激活函数，后面紧跟着一个 2x2 的最大池化层，用于对图像进行降采样。最后，我们使用了一个包含 512 个神经元的全连接层和一个 Dropout 层，用于防止过拟合。最后，我们使用一个包含 53 个神经元的输出层，并将其和前面的全连接层连接起来，用于对图像进行年龄预测。在编译模型时，我们使用了 Adam 优化器和稀疏分类交叉熵损失函数。

进行年龄预测

接下来，我们需要使用训练好的 CNN 模型对人脸图像进行年龄预测。下面是预测的代码：

def predict_age(model, image):
    image = np.expand_dims(image, axis=0)
    age = model.predict(image)[0]
    return np.argmax(age) + 18  # 将从 0 开始的年龄转换为实际年龄

image = images[0]  # 取出一张图片进行预测
predicted_age = predict_age(model, image)
print('Predicted age:', predicted_age)

在上面的代码中，我们首先定义了一个 predict_age 函数，用于对给定的图片进行年龄预测。该函数将输入图片进行扩展，以适应模型的输入格式，并使用模型进行预测。预测结果是一个包含 53 个元素的向量，每个元素表示对应年龄的概率。我们将概率最大的年龄作为预测结果，并将从 0 开始的年龄转换为实际年龄。最后，我们使用 predict_age 函数对一张图片进行预测，并打印出预测结果。

绘制准确率曲线

为了评估模型的性能，我们还需要绘制准确率曲线。下面是绘制准确率曲线的代码：

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=42)

history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

在上面的代码中，我们首先使用 train_test_split 函数将数据集分为训练集和测试集。接着，我们使用 fit 函数来训练模型，并将训练过程中的准确率和验证准确率存储在 history 变量中。最后，我们使用 matplotlib 库绘制准确率曲线。

完整代码如下：

import os
import cv2
import numpy as np
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

def load_data(data_dir):
    images = []
    labels = []
    for filename in os.listdir(data_dir):
        age = int(filename.split('_')[0])
        if age < 18 or age > 70:  # 只使用年龄在 18 到 70 岁之间的数据
            continue
        img = cv2.imread(os.path.join(data_dir, filename))
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (224, 224))
        images.append(img)
        labels.append(age - 18)  # 将年龄转换为从 0 开始的数字
    images = np.array(images)
    labels = np.array(labels)
    return images, labels

def build_model():
    model = Sequential()
    model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Conv2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(53, activation='softmax'))
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

def predict_age(model, image):
    image = np.expand_dims(image, axis=0)
    age = model.predict(image)[0]
    return np.argmax(age) + 18  # 将从 0 开始的年龄转换为实际年龄

data_dir = '/path/to/UTKFace'
images, labels = load_data(data_dir)
model = build_model()

X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=42)

history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_test, y_test))

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

image = images[0]  # 取出一张图片进行预测
predicted_age = predict_age(model, image)
print('Predicted age:', predicted_age)

Python 使用 UTKFace 数据集建立 CNN 网络进行年龄预测，包含数据处理和准确率曲线绘制