Spam Detection: Feature Extraction and Model Training
Part 2
wordsDict = readDict(filepath='./results.pk1')
wordsDict = getDictTopk(dict_data=wordsDict, topk=4000)
saveDict(dict_data=wordsDict, savepath='./wordsDict.pkl')
Explanation:
- The code reads a dictionary from a file using the
readDict()function. - It retrieves the top 4000 words from the dictionary using the
getDictTopk()function. - The resulting dictionary is saved to a file using the
saveDict()function.
Part 3
normal_path = 'D:/6wanDownload/新建文件夹/normal'
spam_path = 'D:/6wanDownload/新建文件夹/spam'
wordsDict = readDict(filepath='./wordsDict.pkl')
normals = getFilesList(filepath=normal_path)
spams = getFilesList(filepath=spam_path)
fvs = []
for normal in normals:
fv = extractFeatures(filepath=os.path.join(normal_path, normal), wordsDict=wordsDict, fv_len=4000)
fvs.append(fv)
normal_len = len(fvs)
for spam in spams:
fv = extractFeatures(filepath=os.path.join(spam_path, spam), wordsDict=wordsDict, fv_len=4000)
fvs.append(fv)
spam_len = len(fvs) - normal_len
print('[INFO]: Normal-%d, Spam-%d' % (normal_len, spam_len))
fvs = mergeFv(fvs)
saveNparray(np_array=fvs, savepath='./fvs_%d_%d.npy' % (normal_len, spam_len))
Explanation:
- The code defines the paths to the directories containing normal and spam files.
- It reads the dictionary from the file using the
readDict()function. - It retrieves the list of files in the normal and spam directories using the
getFilesList()function. - For each normal file, it extracts features using the
extractFeatures()function and appends the feature vector to the listfvs. - The length of the normal files list is stored in the variable
normal_len. - For each spam file, it extracts features using the
extractFeatures()function and appends the feature vector to the listfvs. - The length of the spam files list is calculated by subtracting
normal_lenfrom the total length offvs. - The lengths of the normal and spam files lists are printed.
- The feature vectors are merged using the
mergeFv()function and stored in the variablefvs. - The merged feature vectors are saved to a file using the
saveNparray()function.
Part 4
fvs = readNparray(filepath='fvs_7063_7775.npy')
normal_len = 7063
spam_len = 7775
train(normal_len, spam_len, fvs)
Explanation:
- The feature vectors are read from the file using the
readNparray()function. - The lengths of the normal and spam files lists are assigned to the variables
normal_lenandspam_len. - The
train()function is called to train a model using the lengths and feature vectors.
原文地址: https://www.cveoy.top/t/topic/mao8 著作权归作者所有。请勿转载和采集!