[1]于万国,袁镇濠,陈佳琪,等.分布式子空间局部链接随机向量函数链接网络[J].深圳大学学报理工版,2022,39(6):675-683.[doi:10.3724/SP.J.1249.2022.06675]
 YU Wanguo,YUAN Zhenhao,CHEN Jiaqi,et al.Distributed random vector functional link network with subspace-based local connections[J].Journal of Shenzhen University Science and Engineering,2022,39(6):675-683.[doi:10.3724/SP.J.1249.2022.06675]
点击复制

分布式子空间局部链接随机向量函数链接网络()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第39卷
期数:
2022年第6期
页码:
675-683
栏目:
电子与信息科学
出版日期:
2022-11-15

文章信息/Info

Title:
Distributed random vector functional link network with subspace-based local connections
文章编号:
202206009
作者:
于万国1袁镇濠2陈佳琪2何玉林3
1)河北民族师范学院数学与计算机科学学院,河北承德 067000
2)深圳大学计算机与软件学院,广东深圳 518060
3)人工智能与数字经济广东省实验室(深圳),广东深圳 518107
Author(s):
YU Wanguo1 YUAN Zhenhao2 CHEN Jiaqi2 and HE Yulin3
1) College of Mathematics and Computer Science, Hebei Normal University for Nationalities, Chengde 067000, Hebei Province, P.R.China
2) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P.R.China
3) Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen 518107, Guangdong Province, P.R.China
关键词:
人工智能 随机向量函数链接网络 子空间局部链接 随机样本划分 Hadoop分布式文件系统
Keywords:
artificial intelligence random vector functional link network subspace-based local connection random sample partition Hadoop distributed file system
分类号:
TP311
DOI:
10.3724/SP.J.1249.2022.06675
文献标志码:
A
摘要:
为解决随机向量函数链接(random vector functional link, RVFL)网络处理大规模数据分类时表现出的泛化能力差和计算复杂度高的问题,基于Spark框架设计与实现一种分布式子空间局部链接的RVFL(distributed RVFL with subspace-based local connections, DRVFL-SLC)网络.利用弹性分布式数据集(resilient distributed dataset, RDD)的分区并行性,对存于Hadoop分布式文件系统(Hadoop distributed file system, HDFS)的大规模数据集进行随机样本划分(random sample partition, RSP)操作,保证每个RSP数据块对应RDD的1个分区.其中,RSP数据块是在给定的显著性水平下与大数据保持概率分布一致性的数据子集.在分布式环境下对包含多个分区的RDD调用mapPartitions转换算子并行高效地训练对应的最优RVFL-SLC网络.利用collect执行算子将RDD每个分区对应的最优RVFL-SLC网络进行高效率地渐近融合获得DRVFL-SLC网络以实现对大数据分类问题的近似求解.在部署了6个计算节点的Spark集群上,基于8个百万条记录的大规模数据集对DRVFL-SLC网络的可行性和有效性进行了验证.结果表明,DRVFL-SLC网络拥有很好的加速比、可扩展性以及规模增长性,同时能够获得比在单机上利用全量数据训练的RVFL-SLC网络更好的泛化表现.
Abstract:
In order to solve the problem of poor generalization ability and high computational complexity of random vector functional link (RVFL) network when dealing with large-scale data classification, we design and implement a distributed RVFL network with subspace-based local connections in Spark framework (DRVFL-SLC). Firstly, in order to take advantage of the partition parallelism of resilient distributed dataset (RDD), the large-scale dataset stored in the Hadoop distributed file system HDFS is randomly divided into random sample partition (RSP) data blocks and each RSP data block corresponds to a partition of the RDD, where the RSP data block is a subset of data that maintains probability distribution consistency with the big data at a given significance level. After that, the mapPartitions transformation is invoked on the RDD containing multiple partitions in a distributed environment and this operation trains the corresponding optimal RVFL-SLC efficiently in parallel. Then, the collect execution operator is used to efficiently fuse the optimal RVFL-SLC corresponding to each partition of the RDD to obtain DRVFL-SLC for realizing the classification of big data. Finally, the feasibility and effectiveness of DRVFL-SLC are verified based on several large-scale data set with at least million records on a Spark cluster deployed with 6 computing nodes. The results show that DRVFL-SLC has a good speedup ratio, scalability and scale growth, and can achieve better generalization performance than RVFL-SLC trained on a single machine with full data.

参考文献/References:

[1] IGELNIK B, PAO Y H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net [J]. IEEE transactions on Neural Networks, 1995, 6(6): 1320-1329.
[2] REN Y, SUGANTHAN P N, SRIKANTH N, et al. Random vector functional link network for short-term electricity load demand forecasting [J]. Information Sciences, 2016, 367: 1078-1093.
[3] ZHANG Le, SUGANTHAN P N. A comprehensive evaluation of random vector functional link networks [J]. Information Sciences, 2016, 367: 1094-1105.
[4] SCHMIDT W F, KRAAIJVELD M A, DUIN R P. Feedforward neural networks with random weights [C]// The 11th IAPR International Conference on Pattern Recognition. The Hague, Netherlands: IEEE, 1992: 1-4.
[5] HUANG Guangbin, ZHU Qinyu, SIEW C K. Extreme learning machine: theory and applications [J]. Neurocomputing, 2006, 70(1/2/3): 489-501.
[6] HUANG Guangbin, ZHOU Hongming, DING Xiaojian, et al. Extreme learning machine for regression and multiclass classification [J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2011, 42(2): 513-529.
[7] HUANG Gao, HUANG Guangbin, SONG Shiji, et al. Trends in extreme learning machines: a review [J]. Neural Networks. 2015, 61: 32-48.
[8] LI Feng, YANG Jie, YAO Mingchen, et al. Extreme learning machine with local connections [J]. Neurocomputing, 2019, 368: 146-152.
[9] HE Yulin, YUAN Zhenhao, HUANG Zhexue. Random vector functional link network with subspace-based local connections [J]. Applied Intelligence, 2022. doi: 10.1007/s10489-022-03404-8.
[10] 栾亚建,黄翀民,龚高晟,等.Hadoop平台的性能优化研究[J].计算机工程,2010,36(14):262-263,266.
LUAN Yajian, HUANG Chongmin, GONG Gaosheng, et al. Research on performance optimization of Hadoop platform [J]. Computer Engineering, 2010, 36(14): 262-263, 266.(in Chinese)
[11] SUN Yongjiao, YUAN Ye, WANG Guoren. An OS-ELM based distributed ensemble classification framework in P2P networks [J]. Neurocomputing, 2011, 74(16): 2438-2443.
[12] XIN Junchang, WANG Zhiqiong, CHEN Chen, et al. ELM*: distributed extreme learning machine with map Reduce [J]. World Wide Web, 2014, 17(5): 1189-1204.
[13] CHEN Jiaoyan, CHEN Huajun, WAN Xiangyi, et al. MR-ELM: a mapReduce-based framework for large-scale ELM training in big data era [J]. Neural Computing and Applications, 2016, 27(1): 101-110.
[14] 邓万宇,李力,牛慧娟. 基于Spark的并行极速神经网络[J]. 郑州大学学报工学版,2016,37(5):47-56.
DENG Wanyu, LI Li, NIU Huijuan. Parallel extremely fast neural network based on Spark [J]. Journal of Zhengzhou University Engineering Edition, 2016, 37(5): 47-56.(in Chinese)
[15] 杨敏,刘黎志,邓开巍,等. 基于Spark的自适应差分进化极限学习机研究[J]. 武汉工程大学学报, 2021,43(3):318-323.
YANG Min, LIU Lizhi, DENG Kaiwei, et al. Research on adaptive differential evolution extreme learning machine based on Spark [J]. Journal of Wuhan Institute of Technology, 2021, 43(3): 318-323.(in Chinese)
[16] SCARDAPANE S, WANG D, PANELLA M, et al. Distributed learning for random vector functional-link networks [J]. Information Sciences, 2015, 301: 271-284.
[17] SCARDAPANE S, PANELLA M, COMMINIELLO D, et al. Learning from distributed data sources using random vector functional-link networks [J]. Procedia Computer Science, 2015, 53: 468-477.
[18] ROSATO A, ALTILIO R, PANELLA M. On-line learning of RVFL neural networks on finite precision hardware [C]// The 2018 IEEE International Symposium on Circuits and Systems. Florence, Italy: IEEE, 2018: 1-5.
[19] 赵立杰,陈征,张立强,等.基于交替方向乘子法的球磨机负荷分布式随机权值神经网络模型[J].数据挖掘,2018,8(1):1-8.
ZHAO Lijie, CHEN Zheng, ZHANG Liqiang, et al. Distributed random weight neural network model for ball mill load based on alternating direction multiplier method [J]. Hans Journal of Data Mining, 2018, 8(1): 1-8.(in Chinese)
[20] XIE Jin, LIU Sanyang, DAI Hao, et al. Distributed semi-supervised learning algorithms for random vector functional-link networks with distributed data splitting across samples and features [J]. Knowledge-Based Systems, 2020, 195: 105577.
[21] 黄哲学,何玉林,魏丞昊,等.大数据随机样本划分模型及相关分析计算技术[J].数据采集与处理,2019,34(3):373-385.
HUANG Zhexue, HE Yulin, WEI Chenhao, et al. Big data random sample partition model and related analysis and calculation technology [J]. Journal of Data Acquisition and Processing, 2019, 34(3): 373-385.(in Chinese)
[22] HASAN B T, ABDULLAH D B. A survey of scheduling tasks in big data: Apache Spark [C]// The International Conference on Micro-Electronics and Telecommunication Engineering. Singapore: Springer, 2022: 405-414.
[23] SHAFER J, RIXNER S, COX A L. The hadoop distributed filesystem: balancing portability and performance [C]// IEEE International Symposium on Performance Analysis of Systems & Software. New York, USA: IEEE, 2010: 122-133.
[24] OMAR H K, JUMAA A K. Distributed big data analysis using spark parallel data processing [J]. Bulletin of Electrical Engineering and Informatics, 2022, 11(3): 1505-1515.

相似文献/References:

[1]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(6):211.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(6):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(6):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(6):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(6):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(6):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(6):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(6):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(6):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]王守觉,鲁华祥,陈向东,等.人工神经网络硬件化途径与神经计算机研究[J].深圳大学学报理工版,1997,14(1):8.
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(6):8.

备注/Memo

备注/Memo:
Received: 2022- 04-15; Accepted: 2022-08-23; Online (CNKI): 2022-11-04
Foundation: Basic Research Foundation of Shenzhen (JCYJ20210324093609026)
Corresponding author: Research associate HE Yulin.E-mail: yulinhe@gml.ac.cn
Citation: YU Wanguo, YUAN Zhenhao, CHEN Jiaqi, et al. Distributed random vector functional link network with subspace-based local connections [J]. Journal of Shenzhen University Science and Engineering, 2022, 39(6): 675-683.(in Chinese)
基金项目:深圳市基础研究计划资助面上项目(JCYJ20210324093609026)
作者简介:于万国(1976—),河北民族师范学院副教授.研究方向:大数据计算技术,数据挖掘和机器学习算法.E-mail: cdwanguoyu@hotmail.com
引文:于万国,袁镇濠,陈佳琪,等.分布式子空间局部链接随机向量函数链接网络[J].深圳大学学报理工版,2022,39(6):675-683.
更新日期/Last Update: 2022-11-30