根据如下数据对Outlook=sunnyTemperature=coolHumidity=highWind=strong使用朴素贝叶斯算法进行PlayTennis值的预测。其中朴素贝叶斯算法中的参数估计使用极大似然估计方法。
根据朴素贝叶斯算法,首先需要计算以下概率:
- P(PlayTennis=yes)和P(PlayTennis=no)的概率
从给定的数据中可以看出,PlayTennis的值为yes的次数为9,为no的次数为5,因此:
P(PlayTennis=yes) = 9/14
P(PlayTennis=no) = 5/14
- 对于每个属性,计算其在PlayTennis=yes和PlayTennis=no时各自的条件概率
(1) Outlook的条件概率
Outlook属性有三个取值:sunny、overcast、rainy。在PlayTennis=yes的情况下,sunny出现了3次,overcast出现了4次,rainy出现了2次;在PlayTennis=no的情况下,sunny出现了2次,overcast出现了0次,rainy出现了3次。因此,可以计算出:
P(Outlook=sunny|PlayTennis=yes) = 3/9
P(Outlook=overcast|PlayTennis=yes) = 4/9
P(Outlook=rainy|PlayTennis=yes) = 2/9
P(Outlook=sunny|PlayTennis=no) = 2/5
P(Outlook=overcast|PlayTennis=no) = 0/5
P(Outlook=rainy|PlayTennis=no) = 3/5
(2) Temperature的条件概率
Temperature属性有三个取值:hot、mild、cool。在PlayTennis=yes的情况下,hot出现了2次,mild出现了4次,cool出现了3次;在PlayTennis=no的情况下,hot出现了2次,mild出现了2次,cool出现了1次。因此,可以计算出:
P(Temperature=hot|PlayTennis=yes) = 2/9
P(Temperature=mild|PlayTennis=yes) = 4/9
P(Temperature=cool|PlayTennis=yes) = 3/9
P(Temperature=hot|PlayTennis=no) = 2/5
P(Temperature=mild|PlayTennis=no) = 2/5
P(Temperature=cool|PlayTennis=no) = 1/5
(3) Humidity的条件概率
Humidity属性有两个取值:high、normal。在PlayTennis=yes的情况下,high出现了3次,normal出现了6次;在PlayTennis=no的情况下,high出现了4次,normal出现了1次。因此,可以计算出:
P(Humidity=high|PlayTennis=yes) = 3/9
P(Humidity=normal|PlayTennis=yes) = 6/9
P(Humidity=high|PlayTennis=no) = 4/5
P(Humidity=normal|PlayTennis=no) = 1/5
(4) Wind的条件概率
Wind属性有两个取值:strong、weak。在PlayTennis=yes的情况下,strong出现了3次,weak出现了6次;在PlayTennis=no的情况下,strong出现了3次,weak出现了2次。因此,可以计算出:
P(Wind=strong|PlayTennis=yes) = 3/9
P(Wind=weak|PlayTennis=yes) = 6/9
P(Wind=strong|PlayTennis=no) = 3/5
P(Wind=weak|PlayTennis=no) = 2/5
接下来,根据贝叶斯公式,可以计算出:
P(PlayTennis=yes|Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
= P(Outlook=sunny|PlayTennis=yes) * P(Temperature=cool|PlayTennis=yes) * P(Humidity=high|PlayTennis=yes) * P(Wind=strong|PlayTennis=yes) * P(PlayTennis=yes) / P(Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
= (3/9) * (3/9) * (3/9) * (3/9) * (9/14) / P(Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
由于假设属性之间是条件独立的,因此可以将P(Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)表示为:
P(Outlook=sunny) * P(Temperature=cool) * P(Humidity=high) * P(Wind=strong)
= (5/14) * (3/14) * (7/14) * (6/14)
将其代入上式中,可得:
P(PlayTennis=yes|Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
= (3/9) * (3/9) * (3/9) * (3/9) * (9/14) / ((5/14) * (3/14) * (7/14) * (6/14))
≈ 0.0053
同理,可以计算出:
P(PlayTennis=no|Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
= P(Outlook=sunny|PlayTennis=no) * P(Temperature=cool|PlayTennis=no) * P(Humidity=high|PlayTennis=no) * P(Wind=strong|PlayTennis=no) * P(PlayTennis=no) / P(Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong)
= (2/5) * (1/5) * (4/5) * (3/5) * (5/14) / ((5/14) * (3/14) * (7/14) * (6/14))
≈ 0.0171
因此,根据朴素贝叶斯算法,可以预测<Outlook=sunny,Temperature=cool,Humidity=high,Wind=strong>的PlayTennis值为no
原文地址: http://www.cveoy.top/t/topic/cz5v 著作权归作者所有。请勿转载和采集!