[1]于万国,何玉林,覃荟霖.基于观测点机制的异常点检测算法[J].深圳大学学报理工版,2022,39(3):355-362.[doi:10.3724/SP.J.1249.2022.03355]
 YU Wanguo,HE Yulin,and QIN Huilin.A new outlier detection algorithm based on observation-point mechanism[J].Journal of Shenzhen University Science and Engineering,2022,39(3):355-362.[doi:10.3724/SP.J.1249.2022.03355]
点击复制

基于观测点机制的异常点检测算法
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第39卷
期数:
2022年第3期
页码:
355-362
栏目:
电子与信息科学
出版日期:
2022-05-16

文章信息/Info

Title:
A new outlier detection algorithm based on observation-point mechanism
文章编号:
202203015
作者:
于万国何玉林覃荟霖
1)河北民族师范学院数学与计算机科学学院,河北承德 067000;2)深圳大学计算机与软件学院,广东深圳 518060; 3)深圳大学大数据系统计算技术国家工程实验室,广东深圳 518060
Author(s):
YU Wanguo HE Yulin and QIN Huilin
1) College of Mathematics and Computer Science, Hebei Normal University for Nationalities, Chengde 067000, P. R. China 2) Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P. R. China 3) National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, Guangdong Province, P. R. China
关键词:
人工智能 异常点检测 观测点 近邻 局部异常因子 概率密度函数 核密度估计 数据挖掘
Keywords:
artificial intelligence outlier detection observation point nearest neighbor local outlier factor probability density function kernel density estimation data mining
分类号:
TP391.9;TP311.1
DOI:
10.3724/SP.J.1249.2022.03355
文献标志码:
A
摘要:
异常点检测是数据挖掘领域的一个重要研究方向,传统的基于近邻和局部异常因子的异常点检测算法存在计算复杂度高和误检率高的缺陷.提出一种基于观测点机制的异常点检测(observation-point mechanism-based outlier detection, OPOD)算法,首先在原始样本空间中随机放置若干观测点,然后计算观测点与样本点之间的距离,将原始数据转换为与观测点相对应的距离数据,再估计距离数据的概率密度函数,进而计算距离数据出现的概率值,通过对多个观测点距离数据概率值的融合最终确定原始样本点中的异常点.基于PyCharm平台,采用sklearn.datasets的make_blobs函数生成仿真数据集,分别测试不同规模和不同维度数据集对OPOD算法性能的影响,并对比了OPOD算法、基于局部异常因子的异常点检测(local outlier factor-based outlier detection, LOFOD)算法和基于近邻的异常点检测(nearest neighbor-based outlier detection, NNOD)算法的运行时间、异常点召回率和误检率.结果表明,OPOD算法具有对异常点进行检测的能力,且算法随着观测点数量的增加呈现收敛趋势;在观测点选取合适的条件下,具有比基于近邻和局部异常因子的异常点检测算法更低的时间复杂度和更好的异常点检测效果.
Abstract:
The outlier detection is an important research branch of data mining, which has the widely applications in the finance, telecommunications, biology fields. The traditional nearest neighbor-based outlier detection (NNOD) and local outlier factor-based outlier detection (LOFOD) algorithms usually have high computation complexity and high false detection rate. This paper proposes an observation-point mechanism-based outlier detection (OPOD) algorithm which includes four core steps: ① generating the random observation points in the original data space; ② estimating the probability density function of distance values between the given observation point and all data points; ③ calculating the probabilities of distance values for the given observation point; ④ detecting the outliers by fusing the multiple probabilities corresponding to the different observation points. The exhaustive experiments are conducted to demonstrate the feasibility, rationality, and effectiveness of OPOD algorithm based on the synthetic data sets generated with make_blobs function in sklearn.datasets. The experimental results show that OPOD algorithm is convergent with the increase of observation points and can obtain the better detection performances with the lower computation complexity compared with NNOD and LOFOD algorithms.

相似文献/References:

[1]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(3):211.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(3):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(3):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(3):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(3):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]王守觉,鲁华祥,陈向东,等.人工神经网络硬件化途径与神经计算机研究[J].深圳大学学报理工版,1997,14(1):8.
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(3):8.

更新日期/Last Update: 2022-05-30