YU Wanguo,HE Yulin,and QIN Huilin,et al.A new outlier detection algorithm based on observation-point mechanism[J].Journal of Shenzhen University Science and Engineering,2022,39(3):355-362.[doi:10.3724/SP.J.1249.2022.03355]





A new outlier detection algorithm based on observation-point mechanism
于万国1 何玉林2 3 覃荟霖2
1)河北民族师范学院数学与计算机科学学院,河北承德 067055
2)深圳大学计算机与软件学院,广东深圳 518060
3)人工智能与数字经济广东省实验室(深圳),广东深圳 518107
YU Wanguo1 HE Yulin2 3 and QIN Huilin2
1) College of Mathematics and Computer Science, Hebei Normal University for Nationalities, Chengde 067055, Hebei Province, P. R. China
2) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P. R. China
3) Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen 518107, Guangdong Province, P. R. China
人工智能 异常点检测 观测点 近邻 局部异常因子 概率密度函数 核密度估计 数据挖掘
artificial intelligence outlier detection observation point nearest neighbor local outlier factor probability density function kernel density estimation data mining
异常点检测是数据挖掘领域的一个重要研究方向,传统的基于近邻和局部异常因子的异常点检测算法存在计算复杂度高和误检率高的缺陷.为解决该缺陷,提出一种基于观测点机制的异常点检测(observation-point mechanism-based outlier detection, OPOD)算法.首先在原始样本空间中随机放置若干观测点,然后计算观测点与样本点之间的距离,将原始数据转换为与观测点相对应的距离数据,再估计距离数据的概率密度函数,进而计算距离数据出现的概率值,最后通过对多个观测点距离数据概率值的融合最终确定原始样本点中的异常点.基于PyCharm平台,采用sklearn.datasets的make_blobs函数生成仿真数据集,分别测试不同规模和不同维度数据集对OPOD算法性能的影响,并对比了OPOD算法、基于局部异常因子的异常点检测(local outlier factor-based outlier detection, LOFOD)算法和基于近邻的异常点检测(nearest neighbor-based outlier detection, NNOD)算法的运行时间、异常点召回率和误检率.结果表明,OPOD算法具有对异常点进行检测的能力,且随着观测点数量的增加算法呈收敛趋势;观测点选取合适的条件下,具有比基于近邻和局部异常因子的异常点检测算法更低的时间复杂度和更好的异常点检测效果.
Outlier detection is an important branch of data mining research, and has wide applications in the fields of finance, telecommunications, and biology. The traditional nearest neighbor-based outlier detection (NNOD) and local outlier factor-based outlier detection (LOFOD) algorithms generally have high computational complexity and high false-detection rates. This paper proposes an observation-point mechanism-based outlier detection (OPOD) algorithm comprising four core steps: i) generating random observation points in the original data space; ii) estimating the probability density function of distance values between the given observation point and all data points; iii) calculating the probabilities of distance values for the given observation point; and iv) detecting outliers by combining the probabilities corresponding to the different observation points. Extensive experiments are conducted to demonstrate the feasibility, rationality, and effectiveness of the OPOD algorithm. The experimental results show that the OPOD algorithm converges as the number of observation points increases, and can attain better detection performance with lower computation complexity than the NNOD and LOFOD algorithms.


[1] HODGE V, AUSTIN J. A survey of outlier detection methodologies [J]. Artificial Intelligence Review, 2004, 22(2): 85-126.
[2] WANG H, BAH M J, HAMMAD M. Progress in outlier detection techniques: a survey [J]. IEEE Access, 2019, 7: 107964-108000.
[3] 王立英.异常点检测算法及在网络入侵检测中的应用研究[D].济南:山东师范大学,2020.
WANG Liying. Research on outlier detection algorithm and its application in network intrusion detection system [D]. Jinan: Shandong Normal University, 2020.(in Chinese)
[4] 陈溟.基于模糊局部离群因子(LOF)的信用卡欺诈检测研究[J].金融理论与实践,2016(10):54-57.
CHEN Ming. Research on credit card fraud detection based on fuzzy local outlier factor (LOF) [J]. Financial Theory and Practice, 2016(10): 54-57.(in Chinese)
[5] 郭丽娟,张玉波,尹立群,等.基于离群点检测的变电主设备异常辨识与规律分析[J].南方电网技术,2018,12(9):14-21.
GUO Lijuan, ZHANG Yubo, YIN Liqun, et al. Identification and analysis of main substation equipment abnormal data based on outlier detection method [J]. Southern Power System Technology, 2018, 12(9): 14-21.(in Chinese)
[6] 易江,孙国栋.基于小波变换的天然地震信号异常点检测[J].科技经济导刊,2017,25(1):33.
YI Jiang, SUN Guodong. Outlier detection of natural seismic signal based on wavelet transform [J]. Technology and Economic Guide, 2017, 25(1): 33.(in Chinese)
[7] WILKINSON L. Visualizing big data outliers through distributed aggregation [J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 24(1): 256-266.
[8] CHEN Lin, HE Jing. A histogram-based outlier profile for atomic structures derived from cryo-electron microscopy [C] // Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Niagara Falls, USA: ACM, 2019: 586-591.
[9] SMITI A. A critical overview of outlier detection methods [J]. Computer Science Review, 2020, 38: 100306.
[10] GAN Guojun, NG M K. k-means clustering with outlier removal [J]. Pattern Recognition Letters, 2017, 90: 8-14.
[11] EMADI H S, MAZINANI S M. A novel anomaly detection algorithm using DBSCAN and SVM in wireless sensor networks [J]. Wireless Personal Communications, 2018, 98(2): 2025-2035.
[12] DING Feng, WANG Jian, GE Jiaqi, et al. Anomaly detection in large-scale trajectories using hybrid grid-based hierarchical clustering [J]. International Journal of Robotics & Automation, 2018, 33(5): 474-480.
[13] KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets [C]// Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco, USA: [s.n.], 1998, 98: 392- 403.
[14] 胡云,施珺,王崇骏,等.基于全局最近邻的离群点检测算法[J].计算机应用,2011,31(10):2778-2781.
HU Yun, SHI Jun, WANG Chongjun, et al. Outlier detection algorithm based on global nearest neighborhood [J]. Journal of Computer Applications, 2011, 31(10): 2778-2781.(in Chinese)
[15] HAUTAMAKI V, KARKKAINEN I, FRANTI P. Outlier detection using k-nearest neighbour graph [C]// Proceedings of the 17th International Conference on Pattern Recognition. Cambridge, UK: IEEE, 2004: 430-433.
[16] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [J]. ACM SIGMOD Record, 2000, 29(2): 93-104.
[17] PAPADIMITRIOU S, KITAGAWA H, GIBBONS P B, et al. LOCI: fast outlier detection using the local correlation integral [C]// Proceedings of the 19th International Conference on Data Engineering. Bangalore, India: IEEE, 2003: 315-326.
[18] KRIEGEL H P, PEER K, SCHUBERT E, et al. LoOP: local outlier probabilities [C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009: 1649-1652.
[19] HE Yulin, YE Xuan, HUANG Defa, et al. Novel kernel density estimator based on ensemble unbiased cross-validation [J]. Information Sciences, 2021, 581: 327-344.
[20] GHOSH S. Kernel smoothing: principles, methods and applications [M]. Hoboken, USA: John Wiley & Sons, 2018.
[21] NIXON M, AGUADO A. Feature extraction and image processing for computer vision [M]. [S.l.]: Academic Press, 2019.
[22] SALLOUM S, HUANG J Z, HE Yulin. Random sample partition: a distributed data model for big data analysis [J]. IEEE Transactions on Industrial Informatics, 2019, 15(11): 5846-5854.


 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(3):211.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(3):173.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(3):391.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(3):413.
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):43.[doi:10.3724/SP.J.1249.2014.01043]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):367.[doi:10.3724/SP.J.1249.2014.04367]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(3):196.[doi:10.3724/SP.J.1249.2015.02000]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(3):8.


Received: 2021-05-04; Accepted: 2021-08-10; Online (CNKI): 2022-02-26
Foundation: Research Foundation of Higher Education Teaching Reform of National Ethnic Affairs Commission (19064); Shenzhen Basic Research Foundation (JCYJ20210324093609026); Scientific Research Foundation of Shenzhen University for Newly-introduced Teachers (2018060)
Corresponding author: Associate research fellow HE Yulin.E-mail: yulinhe@szu.edu.cn
Citation: YU Wanguo, HE Yulin, QIN Huilin. A new outlier detection algorithm based on observation-point mechanism [J]. Journal of Shenzhen University Science and Engineering, 2022, 39(3): 355-362.(in Chinese)
作者简介:于万国(1976—),河北民族师范学院副教授.研究方向:大数据计算与分析、数据挖掘和机器学习算法及应用.E-mail: cdwanguoyu@hotmail.com
引 文:引用格式:于万国,何玉林,覃荟霖.基于观测点机制的异常点检测算法[J].深圳大学学报理工版,2022,39(3):355-362.
更新日期/Last Update: 2022-05-30