使用Python从本地文件加载MNIST数据集训练SVM模型

本教程将演示如何使用Python从本地文件加载MNIST数据集，并使用支持向量机（SVM）算法训练模型进行手写数字识别。

1. 加载MNIST数据集

假设您已经从MNIST官网下载了数据集文件（train-images-idx3-ubyte、train-labels-idx1-ubyte、t10k-images-idx3-ubyte、t10k-labels-idx1-ubyte），并将它们存储在本地目录中。以下代码展示了如何使用Python读取这些文件并将数据加载到NumPy数组中：pythonimport numpy as npfrom sklearn.model_selection import train_test_splitfrom sklearn.svm import SVCfrom sklearn.metrics import accuracy_score

定义数据集文件路径train_images = 'train-images-idx3-ubyte' train_labels = 'train-labels-idx1-ubyte' test_images = 't10k-images-idx3-ubyte' test_labels = 't10k-labels-idx1-ubyte'

加载训练数据集with open(train_images, 'rb') as f: train_images_data = np.frombuffer(f.read(), dtype=np.uint8, offset=16).reshape(-1, 28*28)with open(train_labels, 'rb') as f: train_labels_data = np.frombuffer(f.read(), dtype=np.uint8, offset=8)

加载测试数据集with open(test_images, 'rb') as f: test_images_data = np.frombuffer(f.read(), dtype=np.uint8, offset=16).reshape(-1, 28*28)with open(test_labels, 'rb') as f: test_labels_data = np.frombuffer(f.read(), dtype=np.uint8, offset=8)

将像素值缩放到0-1范围train_images_data = train_images_data / 255.0test_images_data = test_images_data / 255.0

2. 数据预处理和划分

在训练模型之前，我们需要对数据进行预处理，例如将图像数据展平成一维向量，并将像素值缩放到0到1之间。然后，我们将使用train_test_split函数将数据集划分为训练集和测试集：python# 划分训练集和测试集X_train, X_test, y_train, y_test = train_test_split(train_images_data, train_labels_data, test_size=0.2, random_state=42)

3. 训练SVM模型

现在我们可以使用训练集数据来训练SVM模型。python# 创建SVM分类器并进行训练svm = SVC()svm.fit(X_train, y_train)

4. 评估模型性能

训练完成后，我们可以使用测试集评估模型的性能。python# 在训练集上进行预测y_pred_train = svm.predict(X_train)

在测试集上进行预测y_pred_test = svm.predict(X_test)

计算训练集和测试集的准确率accuracy_train = accuracy_score(y_train, y_pred_train)accuracy_test = accuracy_score(y_test, y_pred_test)

print('训练集准确率:', accuracy_train)print('测试集准确率:', accuracy_test)

总结

本教程介绍了如何使用Python从本地文件加载MNIST数据集，并使用SVM算法训练模型进行手写数字识别。您可以根据自己的需要修改代码，例如尝试不同的SVM内核函数或调整模型参数以获得更好的性能。