自适应稀疏表示引导的无监督降维

1)山西大学计算机与信息技术学院,山西太原030006; 2)山西大学计算智能与中文信息处理教育部重点实验室,山西太原030006

人工智能; 稀疏表示; 无监督学习; 维度约简; 机器学习; 数据挖掘

Adaptive sparse representation guided unsupervised dimensionality reduction
YUE Qin1, WEI Wei1, 2, FENG Kai1, 2, and CUI Junbiao1

1)School of Computer and Information Technology, Shanxi University, Taiyuan 030006, Shanxi Province, P.R.China2)Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, Shanxi Province, P.R.China

artificial intelligence; sparse representation; unsupervised learning; dimensionality reduction; machine learning; data mining

DOI: 10.3724/SP.J.1249.2020.04425

备注

挖掘并保持数据分布信息是无监督降维的核心问题,为解决传统无监督降维方法大多数只考虑数据分布的局部信息或者全局信息,数据分布信息在低维空间难以保持的缺点,提出一种同时考虑数据分布的全局和局部信息的自适应稀疏表示引导的无监督降维(adaptive sparse representation guided unsupervised dimensionality reduction, ASR_UDR)方法.用稀疏表示挖掘高维空间数据分布的全局信息,通过约束投影后的数据保持图上的平滑性,挖掘数据分布的局部信息,并将这两个过程统一到一个框架中,使之相辅相成,实现数据分布信息的自适应挖掘和数据降维.在WarpAR10P、USPS、MultiB、DLBCLA和DLBCLB数据集上的实验结果表明,与已有的同类无监督降维方法相比,所提方法在显著减少数据维数的同时,可更好地提升后续学习算法的性能.

How to mine and preserve data distribution information is the core problem of unsupervised dimensionality reduction. Most of the traditional unsupervised dimensionality reduction methods only consider the local information or global information of data distribution, and the data distribution information is difficult to be preserved in the low dimensional space. To solve this problem, we propose an adaptive spare representation guided unsupervised dimensionality reduction method to consider the global and local information of the data distribution simultaneously. In this method, the sparse representation is used to mine the global information of high-dimensional data distribution, and the graph smoothness is preserved to mine the local information of data distribution by constraining the projected data during the projection process, in which the graph is represented by the sparse representation coefficient matrix. These two processes are integrated into a framework in order to achieve the mutual guidance of mining information of data distribution and unsupervised dimensionality reduction. The experimental results on the data sets WarpAR10P, USPS, MultiB, DLBCLA and DLBCLB show that compared with the related unsupervised dimensionality reduction methods, the proposed method effectively improves the performance of subsequent learning algorithm meanwhile significantly reducing the data dimensionality.

·