深圳大学学报理工版

谱聚类算法是一种可有效学习数据流形分布和非凸状分布的聚类算法,但其过程涉及构建相似图、特征分解等高计算复杂度步骤,难以直接用于大规模聚类.提出一种基于二部图的快速聚类算法(fast clustering based on bipartite graph,FCBG),通过对数据采样降低原有数据结构规模,然后基于二部图学习采样数据和原有数据关系.通过对二部图对应的拉普拉斯矩阵施加秩约束,FCBG算法可在优化二部图的边的权重的同时,保持二部图的类簇结构,最终直接给出聚类结果,不依赖构图时每条边的初始权重分配.算法计算复杂度与数据大小呈线性关系.实验表明,FCBG算法可有效学习二部图的权重,并在较少的时间消耗下获得高质量的聚类结果.

Spectral clustering algorithm can effectively learn the data manifold distribution and non-convex distribution of data.However, the spectral clustering process which involves the graph construction and eigen-decomposition has the high computational complexity. It is difficult to apply the spectral clustering to deal with the large-scale data directly.The fast clustering based on bipartite graph(FCBG)algorithm reduces the size of original data structure by using the sampling method and learns the relationship between the selection data and original data. The algorithm can optimize the weights of bipartite graph edge mean while maintaining the cluster structure of bipartite graph. The computational complexity of proposed algorithm increases linearly with the increase of data size. The experimental analysis shows that the algorithm can effectively learn the data relationship and obtain the better clustering results with less time consumption.

引言
1 目标函数设计
2 理论分析
3 实验结果与分析
4 结语

图1 FCBG算法流程图 <br/>Fig.1 The flow chart of FCBG algorithm

图1 FCBG算法流程图
Fig.1 The flow chart of FCBG algorithm

表1 三个基准数据集统计信息<br/>Table 1 The statistics of three benchmark datasets

表1 三个基准数据集统计信息
Table 1 The statistics of three benchmark datasets

表2 三个数据集上的聚类准确率<br/>Table 2 Cluster accuracy on three data sets %

表2 三个数据集上的聚类准确率
Table 2 Cluster accuracy on three data sets %

表3 三个数据集上的归一化互信息熵<br/>Table 3 NMI on three data sets %

表3 三个数据集上的归一化互信息熵
Table 3 NMI on three data sets %

表4 三个数据集上的聚类时间<br/>Table 4 Clustering time on three data sets s

表4 三个数据集上的聚类时间
Table 4 Clustering time on three data sets s

[1] JAIN A K, MURTY M N, FLYNN P J. Data clustering: a review[J]. ACM Computing Surveys, 1999, 31(3): 264-323.
[2] HASTIE T, TIBSHIRANI R, FREDMAN J. The elements of statistical learning[M]. New York, USA: Springer series in statistics, 2001.
[3] Von UXBURG U. A tutorial on spectral clustering[J]. Statistics and Computing, 2007, 17(4): 395- 416.
[4] HAGEN L, KAHNG A B. New spectral methods for ratio cut partitioning and clustering[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1992, 11(9): 1074-1085.
[5] SHI Jianbo, MALIK J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8): 888-905.
[6] NG A Y, JORDAN M I, WEISS Y. On spectral clustering: analysis and an algorithm[C]// Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Cambridge, USA: MIT Press, 2001: 849-856.
[7] LIU Wei, HE Junfeng, CHANG S. Large graph construction for scalable semi-supervised learning[C]// Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel:[s.n.], 2010: 679- 686.
[8] CAI Deng, CHEN Xinlei. Large scale spectral clustering via Landmark-based sparse representation[J]. IEEE Transactions on Cybernetics, 2015, 45(8): 1669-1680.
[9] NIE Feiping, WANG Xiaoqian, HUANG Heng. Clustering and projected clustering with adaptive neighbors[C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM, 2014: 977-986.
[10] HUANG Jin, NIE Feiping, HUANG Heng. A new simplex sparse learning model to measure data similarity for clustering[C]// Proceedings of the 24th International Conference on Artificial Intelligence. Buenos, Argentina: AAAI Press, 2015: 3569-3575.
[11] YAN Donghui, Huang Ling, JORDAN M I. Fast approximate spectral clustering[C]// Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, USA: ACM, 2009: 907-916.
[12] ZHU Wei, NIE Feiping, LI Xuelong. Fast spectral clustering with efficient large graph construction[C]// IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans, USA: IEEE, 2017: 2492-2496.
[13] NIE Feiping, WANG Xiaoqian, DENG Cheng, et al. Learning a structured optimal bipartite graph for coclustering[C]// Advances in Neural Information Processing Systems. Long Beach, USA:[s. n.], 2017: 4129- 4138.
[14] MHOAR B. The Laplacian spectrum of graphs[J]. Graph Theory, Combinatorics, and Applications, 1991, 2(871/898): 12.

备注

引言

1 目标函数设计

2 理论分析

3 实验结果与分析

4 结语

期刊信息

备注

引言

1 目标函数设计

2 理论分析

3 实验结果与分析

4 结 语

期刊信息

4 结语