面向分类型矩阵数据的无监督孤立点检测算法

1)山西大学计算机与信息技术学院, 山西太原030006; 2)山西大学计算智能与中文信息处理教育部重点实验室, 山西太原 030006

人工智能; 孤立点检测; 分类型矩阵数据; 耦合度; 内聚度; 数据挖掘

An unsupervised outlier detection algorithm for categorical matrix-object data
WU Xiaolin1, 2 and CAO Fuyuan1, 2

1)School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi Province, P.R.China 2)Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi Province, P.R.China

artificial intelligence; outlier detection; categorical matrix-object data; coupling degree; cohesion degree; data mining

DOI: 10.3724/SP.J.1249.2019.01033

备注

孤立点检测是数据挖掘的重要分支之一,旨在发现一个数据集中与多数对象行为明显不同的一些对象.针对分类型矩阵数据,通过给出一种矩阵对象自身的内聚度和该矩阵对象与其他矩阵对象之间的耦合度,定义了矩阵对象的孤立因子,提出一种面向分类型矩阵数据的孤立点检测算法.在Market basket、Microsoft web和MovieLens真实数据集上的实验结果表明,与基于共同近邻(common-neighbor-based, CNB)算法、局部异常因子(local outlier factor, LOF)算法和基于信息熵(information entropy-based, IE-based)的算法相比,本算法能有效检测分类型矩阵数据中的孤立点.

Outlier detection is an important branch of data mining,aiming at finding the objects in a data set that are significantly different from most objects. In this paper, we define the outlier factor of a matrix-object and propose an outlier detection algorithm for categorical matrix-object data by defining the cohesion degree of a matrix-object itself and the coupling degree with other matrix-objects. The experimental results on real data sets, i.e.,Market basket, Microsoft web, and MovieLens, show that the proposed algorithm can effectively detect the outliers for the matrix-object data set compared with common-neighbor-based(CNB), local outlier factor(LOF), and information entropy-based(IE-based)algorithms.

·