[1]魏丞昊,黄哲学,何玉林.基于统计感知的大数据系统计算框架[J].深圳大学学报理工版,2018,35(No.5(441-550)):441-443.[doi:10.3724/SP.J.1249.2018.05441]
 WEI Chenghao,HUANG Zhexue,and HE Yulin.Statistical aware based big data system computing framework[J].Journal of Shenzhen University Science and Engineering,2018,35(No.5(441-550)):441-443.[doi:10.3724/SP.J.1249.2018.05441]
点击复制

基于统计感知的大数据系统计算框架()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第35卷
期数:
2018年No.5(441-550)
页码:
441-443
栏目:
学术快报
出版日期:
2018-09-25

文章信息/Info

Title:
Statistical aware based big data system computing framework
作者:
魏丞昊黄哲学何玉林
深圳大学计算机与软件学院大数据技术与应用研究所,广东深圳 518060
Author(s):
WEI Chenghao HUANG Zhexue and HE Yulin
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P.R.China
关键词:
大数据随机样本划分逼近式集成学习分布式并行计算 计算机感知分布式处理系统
Keywords:
big data random sample partition asymmetric ensemble learning distributed computation computer perception distributed processing system
分类号:
TP 311
DOI:
10.3724/SP.J.1249.2018.05441
文献标志码:
A
摘要:
为在一定计算资源条件下实现大数据可计算化,提出一种基于统计感知思想的Tbit级大数据系统计算框架Bigdata-α,该框架的核心为大数据随机样本划分模型和逼近式集成学习模型.前者保证了划分后每个子数据块所包含的样本与大数据总体概率分布的一致性.后者通过分析若干个随机样本数据块替代了Tbit级全量数据分析.使用1 Tbit模拟数据集验证随机样本划分模型的有效性,通过逐渐增加随机样本块的个数,提升了HIGGS数据集基分类器的分类准确度,证明该方法能克服大数据分析中计算资源的限制瓶颈.
Abstract:
In order to realize the computability of big data in a certain computing resource, a statistical aware-based big data system computing framework (abbreviated as Bigdata-α) is proposed in this paper to deal with Tbit grade big data. The core of the framework are random sample partition model and asymptotic ensemble learning model. The first one guarantees the consistent distributions between the big data and its data-blocks, while the second one provides an unbiased and convergent learning model by using some samples of the big date. The effectiveness of the random sample partitioning model is verified by using the 1 Tbit simulation dataset. By gradually increasing the number of random sample blocks, the classification accuracy of the base classifier is improved. The massive computing resources is avoided in big data analysis.

相似文献/References:

[1]吕智涵,钟晨,冯良炳,等.用于普适WebGIS多尺度覆盖景观地图的高速索引[J].深圳大学学报理工版,2013,30(No.5(441-550)):480.[doi:10.3724/SP.J.1249.2013.05480]
 Lyu Zhihan,Zhong Chen,Feng Liangbing,et al.A high-speed index for the multi-scale overlay landscape map on ubiquitous WebGIS[J].Journal of Shenzhen University Science and Engineering,2013,30(No.5(441-550)):480.[doi:10.3724/SP.J.1249.2013.05480]
[2]陈国良,毛睿,蔡晔.高性能计算及其相关新兴技术[J].深圳大学学报理工版,2015,32(No.1(001-110)):25.[doi:10.3724/SP.J.1249.2015.01025]
 Chen Guoliang,Mao Rui,and Cai Ye.High performance computing and related new technologies[J].Journal of Shenzhen University Science and Engineering,2015,32(No.5(441-550)):25.[doi:10.3724/SP.J.1249.2015.01025]
[3]聂飞平,王成龙,王榕.基于二部图的快速聚类算法[J].深圳大学学报理工版,2019,(No.1(1-110)):18.[doi:10.3724/SP.J.1249.2019.01018]
 NIE Feiping,WANG Chenglong,and WANG Rong.Fast clustering with bipartite graph[J].Journal of Shenzhen University Science and Engineering,2019,(No.5(441-550)):18.[doi:10.3724/SP.J.1249.2019.01018]
[4]韩迪,等.增量学习的优化算法在app使用预测中的应用[J].深圳大学学报理工版,2019,(No.1(1-110)):43.[doi:10.3724/SP.J.1249.2019.01043]
 HAN Di,LI Wenting,et al.The application of optimization algorithm based on incremental learning in app usage prediction[J].Journal of Shenzhen University Science and Engineering,2019,(No.5(441-550)):43.[doi:10.3724/SP.J.1249.2019.01043]

更新日期/Last Update: 2018-08-21