[1]王馨月,景丽萍.基于分层抽样的不均衡数据集成分类[J].深圳大学学报理工版,2019,36(1):24-32.[doi:10.3724/SP.J.1249.2019.01024]
 WANG Xinyue and JING Liping.Stratified sampling based ensemble classification for imbalanced data[J].Journal of Shenzhen University Science and Engineering,2019,36(1):24-32.[doi:10.3724/SP.J.1249.2019.01024]
点击复制

基于分层抽样的不均衡数据集成分类()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第36卷
期数:
2019年第1期
页码:
24-32
栏目:
电子与信息科学
出版日期:
2019-01-20

文章信息/Info

Title:
Stratified sampling based ensemble classification for imbalanced data
作者:
王馨月景丽萍
北京交通大学计算机与信息技术学院,北京 100044
Author(s):
WANG Xinyue and JING Liping
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, P.R.China
关键词:
人工智能不均衡分类分层抽样集成学习聚类数据挖掘
Keywords:
artificial intelligence imbalance classification stratified sampling ensemble learning clustering data mining
分类号:
TP 181
DOI:
10.3724/SP.J.1249.2019.01024
摘要:
不均衡数据在实际应用场景中随处可见,然而处理不均衡数据面临两大核心问题:如何从多数类占绝对优势的数据集合中最大程度地挖掘少数类信息;如何确保在不过度损失多数类信息的前提下构建学习器. 一种简单但有效的方式是对多数类样本进行降采样,但是现有的降采样方法往往会破坏原始数据结构特性或造成信息损失严重.本研究提出一种基于分层抽样的不均衡数据集成分类方法(分层抽样集成学习).该方法通过充分挖掘多数类样本结构信息,对其进行聚类划分;然后通过在数据块上的分层抽样构建集成学习数据成员,确保单个学习器的输入数据均衡且保留原始数据的结构信息,从而提升后续集成分类性能.在不均衡数据集Musk1、Ecoli3、Glass2和Yeast6上的实验结果表明,所提出的不均衡数据集成分类方法全部有效提升了分类性能.
Abstract:
The imbalanced data set is ubiquitous in real-world applications. There are two key issues for under-sampling based imbalanced data classification. One is how to take advantage of the minority class especially when the ratio between the majority class and minority class is large. The second one is how to preserve the intrinsic structure among majority class if we do under-sampling on majority class information. A simple but effective strategy is to conduct under-sampling in the majority class. The existing methods usually suffer from losing the information or destroying the intrinsic structure of the original data set. In this paper, we propose a new imbalanced data ensemble classification method with the aid of stratified sampling on majority class (EC-SS). To sufficiently mine the hidden structure in majority class, an adaptive self-tuning clustering strategy is adopted to separate the major-class samples into different strata and then the stratified sampling is used to under-sample the majority class. This strategy works well to generate the data components for subsequent ensemble learning, and its main advantage is to keep the data structure of the original data set. A series of experiments on real benchmark datasets show that the proposed EC-SS outperforms the baselines.

相似文献/References:

[1]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(1):211.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(1):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(1):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(1):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(1):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(1):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(1):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(1):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]王守觉,鲁华祥,陈向东,等.人工神经网络硬件化途径与神经计算机研究[J].深圳大学学报理工版,1997,14(1):8.
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(1):8.

更新日期/Last Update: 2019-01-30