|Table of Contents|

A new outlier detection algorithm based on observation-point mechanism(PDF)

Journal of Shenzhen University Science and Engineering[ISSN:1000-2618/CN:44-1401/N]

Issue:
2022 Vol.39 No.3(237-362)
Page:
355-362
Research Field:
Electronics and Information Science

Info

Title:
A new outlier detection algorithm based on observation-point mechanism
Author(s):
YU Wanguo1 HE Yulin2 3 and QIN Huilin2
1) College of Mathematics and Computer Science, Hebei Normal University for Nationalities, Chengde 067055, Hebei Province, P. R. China
2) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P. R. China
3) Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen 518107, Guangdong Province, P. R. China
Keywords:
artificial intelligence outlier detection observation point nearest neighbor local outlier factor probability density function kernel density estimation data mining
PACS:
TP391.9;TP311.1
DOI:
10.3724/SP.J.1249.2022.03355
Abstract:
Outlier detection is an important branch of data mining research, and has wide applications in the fields of finance, telecommunications, and biology. The traditional nearest neighbor-based outlier detection (NNOD) and local outlier factor-based outlier detection (LOFOD) algorithms generally have high computational complexity and high false-detection rates. This paper proposes an observation-point mechanism-based outlier detection (OPOD) algorithm comprising four core steps: i) generating random observation points in the original data space; ii) estimating the probability density function of distance values between the given observation point and all data points; iii) calculating the probabilities of distance values for the given observation point; and iv) detecting outliers by combining the probabilities corresponding to the different observation points. Extensive experiments are conducted to demonstrate the feasibility, rationality, and effectiveness of the OPOD algorithm. The experimental results show that the OPOD algorithm converges as the number of observation points increases, and can attain better detection performance with lower computation complexity than the NNOD and LOFOD algorithms.

References:

[1] HODGE V, AUSTIN J. A survey of outlier detection methodologies [J]. Artificial Intelligence Review, 2004, 22(2): 85-126.
[2] WANG H, BAH M J, HAMMAD M. Progress in outlier detection techniques: a survey [J]. IEEE Access, 2019, 7: 107964-108000.
[3] 王立英.异常点检测算法及在网络入侵检测中的应用研究[D].济南:山东师范大学,2020.
WANG Liying. Research on outlier detection algorithm and its application in network intrusion detection system [D]. Jinan: Shandong Normal University, 2020.(in Chinese)
[4] 陈溟.基于模糊局部离群因子(LOF)的信用卡欺诈检测研究[J].金融理论与实践,2016(10):54-57.
CHEN Ming. Research on credit card fraud detection based on fuzzy local outlier factor (LOF) [J]. Financial Theory and Practice, 2016(10): 54-57.(in Chinese)
[5] 郭丽娟,张玉波,尹立群,等.基于离群点检测的变电主设备异常辨识与规律分析[J].南方电网技术,2018,12(9):14-21.
GUO Lijuan, ZHANG Yubo, YIN Liqun, et al. Identification and analysis of main substation equipment abnormal data based on outlier detection method [J]. Southern Power System Technology, 2018, 12(9): 14-21.(in Chinese)
[6] 易江,孙国栋.基于小波变换的天然地震信号异常点检测[J].科技经济导刊,2017,25(1):33.
YI Jiang, SUN Guodong. Outlier detection of natural seismic signal based on wavelet transform [J]. Technology and Economic Guide, 2017, 25(1): 33.(in Chinese)
[7] WILKINSON L. Visualizing big data outliers through distributed aggregation [J]. IEEE Transactions on Visualization and Computer Graphics, 2017, 24(1): 256-266.
[8] CHEN Lin, HE Jing. A histogram-based outlier profile for atomic structures derived from cryo-electron microscopy [C] // Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Niagara Falls, USA: ACM, 2019: 586-591.
[9] SMITI A. A critical overview of outlier detection methods [J]. Computer Science Review, 2020, 38: 100306.
[10] GAN Guojun, NG M K. k-means clustering with outlier removal [J]. Pattern Recognition Letters, 2017, 90: 8-14.
[11] EMADI H S, MAZINANI S M. A novel anomaly detection algorithm using DBSCAN and SVM in wireless sensor networks [J]. Wireless Personal Communications, 2018, 98(2): 2025-2035.
[12] DING Feng, WANG Jian, GE Jiaqi, et al. Anomaly detection in large-scale trajectories using hybrid grid-based hierarchical clustering [J]. International Journal of Robotics & Automation, 2018, 33(5): 474-480.
[13] KNORR E M, NG R T. Algorithms for mining distance-based outliers in large datasets [C]// Proceedings of the 24th International Conference on Very Large Data Bases. San Francisco, USA: [s.n.], 1998, 98: 392- 403.
[14] 胡云,施珺,王崇骏,等.基于全局最近邻的离群点检测算法[J].计算机应用,2011,31(10):2778-2781.
HU Yun, SHI Jun, WANG Chongjun, et al. Outlier detection algorithm based on global nearest neighborhood [J]. Journal of Computer Applications, 2011, 31(10): 2778-2781.(in Chinese)
[15] HAUTAMAKI V, KARKKAINEN I, FRANTI P. Outlier detection using k-nearest neighbour graph [C]// Proceedings of the 17th International Conference on Pattern Recognition. Cambridge, UK: IEEE, 2004: 430-433.
[16] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [J]. ACM SIGMOD Record, 2000, 29(2): 93-104.
[17] PAPADIMITRIOU S, KITAGAWA H, GIBBONS P B, et al. LOCI: fast outlier detection using the local correlation integral [C]// Proceedings of the 19th International Conference on Data Engineering. Bangalore, India: IEEE, 2003: 315-326.
[18] KRIEGEL H P, PEER K, SCHUBERT E, et al. LoOP: local outlier probabilities [C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2009: 1649-1652.
[19] HE Yulin, YE Xuan, HUANG Defa, et al. Novel kernel density estimator based on ensemble unbiased cross-validation [J]. Information Sciences, 2021, 581: 327-344.
[20] GHOSH S. Kernel smoothing: principles, methods and applications [M]. Hoboken, USA: John Wiley & Sons, 2018.
[21] NIXON M, AGUADO A. Feature extraction and image processing for computer vision [M]. [S.l.]: Academic Press, 2019.
[22] SALLOUM S, HUANG J Z, HE Yulin. Random sample partition: a distributed data model for big data analysis [J]. IEEE Transactions on Industrial Informatics, 2019, 15(11): 5846-5854.

Memo

Memo:
-