医院礼品店已完成5项交易购买记录清单如1 2 3 4 1 5 6 2 6 1 4 5 3 1 2 3所示请使用Apriori算法进行关联规则分析。最小支持度设为04def loadDataSet return1 2 3 4 1 5 6 2 6 1 4 5 3 1 2 3D = loadDataSet
构建项集
def createC1(dataSet): C1 = [] for transaction in dataSet: for item in transaction: if not [item] in C1: C1.append([item]) C1.sort() return list(map(frozenset, C1))
过滤掉不满足最小支持度的项集
def scanD(D, Ck, minSupport): ssCnt = {} for tid in D: for can in Ck: if can.issubset(tid): if not can in ssCnt: ssCnt[can] = 1 else: ssCnt[can] += 1 numItems = float(len(D)) retList = [] supportData = {} for key in ssCnt: support = ssCnt[key] / numItems if support >= minSupport: retList.insert(0, key) supportData[key] = support return retList, supportData
生成频繁项集
def aprioriGen(Lk, k): retList = [] lenLk = len(Lk) for i in range(lenLk): for j in range(i+1, lenLk): L1 = list(Lk[i])[:k-2] L2 = list(Lk[j])[:k-2] L1.sort() L2.sort() if L1 == L2: retList.append(Lk[i] | Lk[j]) return retList
完整的Apriori算法
def apriori(dataSet, minSupport=0.5): C1 = createC1(dataSet) D = list(map(set, dataSet)) L1, supportData = scanD(D, C1, minSupport) L = [L1] k = 2 while (len(L[k-2]) > 0): Ck = aprioriGen(L[k-2], k) Lk, supK = scanD(D, Ck, minSupport) supportData.update(supK) L.append(Lk) k += 1 return L, supportData
生成关联规则
def generateRules(L, supportData, minConf=0.7): bigRuleList = [] for i in range(1, len(L)): for freqSet in L[i]: H1 = [frozenset([item]) for item in freqSet] if i > 1: rulesFromConseq(freqSet, H1, supportData, bigRuleList, minConf) else: calcConf(freqSet, H1, supportData, bigRuleList, minConf) return bigRuleList
计算置信度
def calcConf(freqSet, H, supportData, brl, minConf=0.7): prunedH = [] for conseq in H: conf = supportData[freqSet] / supportData[freqSet - conseq] if conf >= minConf: print(freqSet-conseq, '-->', conseq, 'conf:', conf) brl.append((freqSet-conseq, conseq, conf)) prunedH.append(conseq) return prunedH
生成候选规则集合
def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.7): m = len(H[0]) if len(freqSet) > (m + 1): Hmp1 = aprioriGen(H, m+1) Hmp1 = calcConf(freqSet, Hmp1, supportData, brl, minConf) if len(Hmp1) > 1: rulesFromConseq(freqSet, Hmp1, supportData, brl, minConf)
执行关联规则分析
L, supportData = apriori(D, minSupport=0.4) rules = generateRules(L, supportData, minConf=0.7
原文地址: https://www.cveoy.top/t/topic/ccjQ 著作权归作者所有。请勿转载和采集!