[1]于万国,袁镇濠,陈佳琪,等.分布式子空间局部链接随机向量函数链接网络[J].深圳大学学报理工版,2022,39(6):675-683.[doi:10.3724/SP.J.1249.2022.06675]
 YU Wanguo,YUAN Zhenhao,CHEN Jiaqi,et al.Distributed random vector functional link network with subspace-based local connections[J].Journal of Shenzhen University Science and Engineering,2022,39(6):675-683.[doi:10.3724/SP.J.1249.2022.06675]
点击复制

分布式子空间局部链接随机向量函数链接网络()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第39卷
期数:
2022年第6期
页码:
675-683
栏目:
电子与信息科学
出版日期:
2022-11-15

文章信息/Info

Title:
Distributed random vector functional link network with subspace-based local connections
文章编号:
202206009
作者:
于万国袁镇濠陈佳琪何玉林
1)河北民族师范学院数学与计算机科学学院,河北承德 067000;2)深圳大学计算机与软件学院大数据所,广东深圳 518060;3)人工智能与数字经济广东省实验室(深圳),广东深圳 518107
Author(s):
YU Wanguo YUAN Zhenhao CHEN Jiaqi HE Yulin
1) College of Mathematics and Computer Science, Hebei Normal University for Nationalities, Chengde 067000, P.R.China 2) Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Sichuan Province, Guangdong Province, P.R.China 3) Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518107 Guangdong Province, P.R.China
关键词:
人工智能随机向量函数链接网络子空间局部链接随机样本划分Hadoop分布式文件系统
Keywords:
artificial intelligence random vector functional link network subspace-based local connection random sample partition Hadoop distributed file system
分类号:
TP311
DOI:
10.3724/SP.J.1249.2022.06675
文献标志码:
A
摘要:
为解决随机向量函数链接(random vector functional link, RVFL)网处理大规模数据分类时表现出的泛化能力差和计算复杂度高的问题,基于Spark框架设计与实现一种分布式子空间局部链接的RVFL(distributed RVFL with subspace-based local connections, DRVFL-SLC)网.利用弹性分布式数据集(resilient distributed dataset, RDD)的分区并行性,对存于Hadoop分布式文件系统(Hadoop distributed file system, HDFS)的大规模数据集进行随机样本划分(random sample partition, RSP)操作,保证每个RSP数据块对应RDD的1个分区.其中,RSP数据块是在给定的显著性水平下与大数据保持概率分布一致性的数据子集.在分布式环境下对包含多个分区的RDD调用mapPartitions转换算子并行高效地训练对应的最优RVFL-SLC网.利用collect执行算子将RDD每个分区对应的最优RVFL-SLC网进行高效率地渐近融合获得DRVFL-SLC网以实现对大数据分类问题的近似求解.在部署了6个计算节点的Spark集群上,基于8个百万条记录的大规模数据集对DRVFL-SLC网的可行性和有效性进行了验证.结果表明,DRVFL-SLC网拥有很好的加速比、可扩展性以及规模增长性,同时能够获得比在单机上利用全量数据训练的RVFL-SLC网更好的泛化表现.
Abstract:
In order to solve the problem of poor generalization ability and high computational complexity of random vector functional link (RVFL) network when dealing with large-scale data classification, this paper designs and implements a distributed RVFL network with subspace-based local connections in Spark framework (DRVFL-SLC). Firstly, in order to take advantage of the partition parallelism of resilient distributed dataset (RDD), the large-scale dataset stored in the Hadoop distributed file system HDFS is randomly divided (random sample partition, RSP) and each RSP data block corresponds to a partition of the RDD, where the RSP data block is a subset of data that maintains probability distribution consistency with the big data at a given significance level. After that, the mapPartitions transformation is called on the RDD containing multiple partitions in a distributed environment and this operation trains the corresponding optimal RVFL-SLC efficiently in parallel; Then, the collect execution operator is used to efficiently asymptotically fuse the optimal RVFL-SLC corresponding to each partition of the RDD to obtain DRVFL-SLC to realize the classification of big data; Finally, the feasibility and effectiveness of DRVFL-SLC are verified based on a large-scale dataset of 8 million records on a Spark cluster deployed with 6 computing nodes, the result shows that DRVFL-SLC has a good speedup ratio, scalability and scale growth, and can achieve better generalization performance than RVFL-SLC trained on a single machine with full data.

相似文献/References:

[1]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(6):211.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(6):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(6):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(6):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(6):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(6):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(6):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(6):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(6):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]王守觉,鲁华祥,陈向东,等.人工神经网络硬件化途径与神经计算机研究[J].深圳大学学报理工版,1997,14(1):8.
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(6):8.

更新日期/Last Update: 2022-11-30