随着科技和信息化程度的迅速提升,军事科技情报数量急速膨胀,实现该类文本的自动化分类成为军事科技情报研究领域中的一个重要研究方向。本文根据企业长期积累的经过人工筛选标注的情报数据资源,提取所需信息,建立了军事科技情报分类数据集。针对该数据集的特殊结构,本文根据词在类标题或类关键词中的词频设置了相应的位置权重来调整TFIDF值,弥补了TFIDF忽略词位置和词类别分布情况的缺陷。在改进TFIDF权重的基础上,本文结合高精度的机器学习模型,实现了对军事科技情报的分类,获得97.2%的准确率。实验结果表明,本文采用的方法能显著提升军事科技情报文本的分类精度。

With the rapid development of technology and informatization, the number of military scientific and technological intelligence has increased rapidly. The automatic classification of such texts has become an important research direction in the field of military scientific and technological intelligence. In this paper, based on the intelligence data resources accumulated by the enterprise for a long time through manual screening and annotation, the necessary information is extracted, and a military scientific and technological intelligence classification dataset is established. In view of the special structure of the dataset, this paper sets the corresponding position weight based on the frequency of words in the class title or class keywords to adjust the TFIDF value, which makes up for the shortcomings of TFIDF ignoring word position and word category distribution. On the basis of improving the TFIDF weight, this paper combines a high-precision machine learning model to achieve the classification of military scientific and technological intelligence, with an accuracy rate of 97.2%. The experimental results show that the method adopted in this paper can significantly improve the classification accuracy of military scientific and technological intelligence texts.

军事科技情报自动分类研究:基于改进TFIDF的机器学习方法

原文地址: https://www.cveoy.top/t/topic/oWTR 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录