基于二部图的快速聚类算法

西北工业大学计算机学院,西北工业大学光学影像分析与学习中心,陕西西安 710072

计算机应用技术; 聚类; 大数据; 谱图理论; 二部图; 秩约束

DOI: 10.3724/SP.J.1249.2019.01018

备注

谱聚类算法是一种可有效学习数据流形分布和非凸状分布的聚类算法,但其过程涉及构建相似图、特征分解等高计算复杂度步骤,难以直接用于大规模聚类.提出一种基于二部图的快速聚类算法(fast clustering based on bipartite graph,FCBG),通过对数据采样降低原有数据结构规模,然后基于二部图学习采样数据和原有数据关系.通过对二部图对应的拉普拉斯矩阵施加秩约束,FCBG算法可在优化二部图的边的权重的同时,保持二部图的类簇结构,最终直接给出聚类结果,不依赖构图时每条边的初始权重分配.算法计算复杂度与数据大小呈线性关系.实验表明,FCBG算法可有效学习二部图的权重,并在较少的时间消耗下获得高质量的聚类结果.

Spectral clustering algorithm can effectively learn the data manifold distribution and non-convex distribution of data.However, the spectral clustering process which involves the graph construction and eigen-decomposition has the high computational complexity. It is difficult to apply the spectral clustering to deal with the large-scale data directly.The fast clustering based on bipartite graph(FCBG)algorithm reduces the size of original data structure by using the sampling method and learns the relationship between the selection data and original data. The algorithm can optimize the weights of bipartite graph edge mean while maintaining the cluster structure of bipartite graph. The computational complexity of proposed algorithm increases linearly with the increase of data size. The experimental analysis shows that the algorithm can effectively learn the data relationship and obtain the better clustering results with less time consumption.

·