西瓜数据集分类:Python 代码实现朴素贝叶斯算法
首先,我们需要计算数据集中每个特征的类别数量和类别比例。下面的代码实现了这一步骤:
class_counts = {}
for data in dataset:
label = data[-1]
if label not in class_counts:
class_counts[label] = 0
class_counts[label] += 1
class_ratios = {}
for label, count in class_counts.items():
class_ratios[label] = count / len(dataset)
接下来,我们需要计算每个特征的每个类别的数量和比例。下面的代码实现了这一步骤:
feature_counts = {}
feature_ratios = {}
for feature in features:
feature_counts[feature] = {}
feature_ratios[feature] = {}
for data in dataset:
label = data[-1]
value = data[features.index(feature)]
if value not in feature_counts[feature]:
feature_counts[feature][value] = {}
feature_ratios[feature][value] = {}
if label not in feature_counts[feature][value]:
feature_counts[feature][value][label] = 0
feature_ratios[feature][value][label] = 0
feature_counts[feature][value][label] += 1
for feature in features:
for value in feature_counts[feature]:
total_count = sum(feature_counts[feature][value].values())
for label in feature_counts[feature][value]:
feature_ratios[feature][value][label] = feature_counts[feature][value][label] / total_count
现在我们可以使用这些统计信息来进行分类。我们可以使用朴素贝叶斯算法,根据先验概率和条件概率来计算后验概率,并选择后验概率最大的类别作为预测结果。下面的代码实现了这一步骤:
def predict(data):
probabilities = {}
for label in class_counts:
probabilities[label] = class_ratios[label]
for feature in features:
value = data[features.index(feature)]
probabilities[label] *= feature_ratios[feature][value][label]
return max(probabilities, key=probabilities.get)
data = ['浅白', '稍蜷', '浊响', '模糊', '平坦', '硬滑']
prediction = predict(data)
print(prediction)
运行代码,可以得到预测结果为'否'。
原文地址: https://www.cveoy.top/t/topic/pcWz 著作权归作者所有。请勿转载和采集!