[1]王一宾,吴陈,程玉胜,等.不平衡标记差异性多标记特征选择算法[J].深圳大学学报理工版,2020,37(3):234-242.[doi:10.3724/SP.J.1249.2020.03234]
 WANG Yibin,WU Chen,et al.Multi-label feature selection algorithm with imbalance label otherness[J].Journal of Shenzhen University Science and Engineering,2020,37(3):234-242.[doi:10.3724/SP.J.1249.2020.03234]
点击复制

不平衡标记差异性多标记特征选择算法()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第37卷
期数:
2020年第3期
页码:
234-242
栏目:
电子与信息科学
出版日期:
2020-05-20

文章信息/Info

Title:
Multi-label feature selection algorithm with imbalance label otherness
文章编号:
202003004
作者:
王一宾12吴陈1程玉胜12江健生12
1)安庆师范大学计算机与信息院,安徽安庆 246133
2)安徽省高校智能感知与计算重点实验室,安徽安庆 246133
Author(s):
WANG Yibin1 2 WU Chen1 CHENG Yusheng1 2 and JIANG Jiansheng1 2
1) School of Computer and Information, Anqing Normal University, Anqing 246133, Anhui Province, P.R.China
2) The University Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing 246133, Anhui Province, P.R.China
关键词:
人工智能多标记学习特征选择不平衡数据标记相关性信息熵标记差异性
Keywords:
artificial intelligence multi-label learning feature selection imbalanced data label correlation information entropy label otherness
分类号:
TP311;TP181
DOI:
10.3724/SP.J.1249.2020.03234
文献标志码:
A
摘要:
针对现有的特征选择算法大多未考虑不同标记对样本的描述程度可能存在差异的问题,提出一种不平衡标记差异性多标记特征选择算法(multi-label feature selection algorithm with imbalance label otherness, MSIO),将不同标记下正负标记的频率分布作为该标记的权值加入到特征选择的过程中,并修正传统的信息熵计算方法,从而得到一组更高效的特征序列.以多标记k近邻(multi-label k-nearest neighbor, ML-kNN)为基础分类器,在Mulan数据库的11个多标记基准数据集上,对基于最大相关性的多标记维数约简(multi-label dimensionality reduction via dependence maximization, MDDM)算法、基于多变量互信息的多标记特征选择算法PMU(pairwise multivariate mutual information)、多标记朴素贝叶斯分类的特征选择(feature selection for multi-label naive Bayes classification, MLNB)算法、基于标记相关性的多标记特征选择(multi-label feature selection with label correlation, MUCO)算法和MSIO算法进行评价,实验结果和统计假设检验说明,MSIO算法稳定性佳且分类精度高,具有一定的有效性和优越性.
Abstract:
In view of the fact that most of the existing feature selection algorithms do not consider the possible differences existing in the sample description by different labels, a multi-label feature selection algorithm with imbalance label otherness (MSIO) is proposed. The frequency distributions of positive and negative labels under different labels are added to the process of feature selection as the label weight, the traditional method of calculating information entropy is modified to get a more efficient feature sequence. Based on ML-kNN (multi-label k-nearest neighbor), the features are classified on 11 multi-label benchmark datasets of Mulan database, and the algorithms of multi-label dimensionality reduction via dependency maximization (MDDM), pairwise multivariate mutual information (PMU), feature selection for multi-label naive Bayes classification (MLNB), multi-label feature selection with label correlation (MUCO) and MSIO algorithm are evaluated. Experimental results and statistical hypothesis tests show that MSIO algorithm has good stability, high classification accuracy, and certain effectiveness and superiority.

参考文献/References:

[1] PAN Xiaoyong, FAN Yongxian, JIA Jue, et al. Identifying RNA-binding proteins using multi-label deep learning[J]. Science China Information Sciences, 2019, 62(1): 19103.
[2] ROMAN-RANGEL E, MARCHAND-MAILLET S. Inductive t-SNE via deep learning to visualize multi-label images[J]. Engineering Applications of Artificial Intelligence, 2019, 81: 336-345.
[3] CHENG Yusheng, ZHAO Dawei, ZHAN Wenfa, et al. Multi-label learning of non-equilibrium labels completion with mean shift[J]. Neurocomputing, 2018, 321: 92-102.
[4] 刘军煜,贾修一.一种利用关联规则挖掘的多标记分类算法[J].软件学报,2017,28(11):2865-2878.
LIU Junyu, JIA Xiuyi. Multi-label classification algorithm based on association rule mining[J]. Journal of Software, 2017, 28(11): 2865-2878.(in Chinese)
[5] 何志芬,杨明,刘会东.多标记分类和标记相关性的联合学习[J].软件学报,2014,25(9):1967-1981.
HE Zhifen, YANG Ming, LIU Huidong. Joint learning of multi-label classification and label correlations[J]. Journal of Software, 2014, 25(9): 1967-1981.(in Chinese)
[6] 蔡亚萍,杨明.一种利用局部标记相关性的多标记特征选择算法[J].南京大学学报自然科学版,2016,52(4):693-704.
CAI Yaping, YANG Ming. A multi-label feature selection algorithm by exploiting label correlations locally[J]. Journal of Nanjing University Natural Sciences, 2016, 52(4): 693-704.(in Chinese)
[7] 吴磊,张敏灵.基于类属属性的多标记学习算法[J].软件学报,2014,25(9): 1992-2001.
WU Lei, ZHANG Minling. Label-specific features on multi-label learning algorithm[J]. Journal of Software, 2014, 25(9): 1992-2001.(in Chinese)
[8] 王一宾,程玉胜,何月,等.回归核极限学习机的多标记学习算法[J].模式识别与人工智能,2018,31(5):419-430.
WANG Yibin, CHENG Yusheng, HE Yue, et al. Multi-label learning algorithm of regression kernel extreme learning machine[J]. Pattern Recognition and Artificial Intelligence, 2018, 31(5): 419-430.(in Chinese)
[9] 黄莉莉,汤进,孙登第,等.基于多标签ReliefF的特征选择算法[J].计算机应用,2012,32(10): 2888-2890.
HUANG Lili, TANG Jin, SUN Dengdi, et al. Feature selection algorithm based on multi-label ReliefF[J]. Journal of Computer Applications, 2012, 32(10): 2888-2890.(in Chinese)
[10] 张振海,李士宁,李志刚,等.一类基于信息熵的多标签特征选择算法[J].计算机研究与发展,2013,50(6):1177-1184.
ZHANG Zhenhai, LI Shining, LI Zhigang, et al. Multi-label feature selection algorithm based on information entropy[J]. Journal of Computer Research and Development, 2013, 50(6): 1177-1184.(in Chinese)
[11] 刘景华,林梦雷,王晨曦,等.基于局部子空间的多标记特征选择算法[J].模式识别与人工智能,2016,29(3):240-251.
LIU Jinghua, LIN Menglei, WANG Chenxi, et al. Multi-label feature selection algorithm based on local subspace[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(3): 240-251.(in Chinese)
[12] LIN Yaojin, HU Qinghua, LIU Jinghua, et al. Multi-label feature selection based on neighborhood mutual information[J]. Applied Soft Computing, 2016, 38: 244-256.
[13] LIN Yaojin, HU Qinghua, LIU Jinghua,et al. Streaming feature selection for multilabel learning based on fuzzy mutual information[J]. IEEE Transactions on Fuzzy Systems, 2017, 25(6): 1491-1507.
[14] 程玉胜,赵大卫,钱坤.近邻标签空间非平衡化标签补全的多标签学习[J].模式识别与人工智能,2018,31(8):740-749.
CHENG Yusheng,ZHAO Dawei,QIAN Kun. Multi-label learning for non-equilibrium labels completion in neighborhood labels space[J].Pattern Recognition and Artificial Intelligence, 2018,31(8): 740-749.(in Chinese)
[15] 程玉胜,陈飞,王一宾.基于粗糙集的数据流多标记分布特征选择[J].计算机应用,2018,38(11):3105-3111.
CHENG Yusheng, CHEN Fei, WANG Yibin. Feature selection for multi-label distribution learning with streaming data based on rough set[J]. Journal of Computer Applications, 2018, 38(11): 3105-3111.(in Chinese)
[16] 李志欣,卓亚琦,张灿龙,等.多标记学习研究综述[J].计算机应用研究,2014,31(6): 1601-1605.
LI Zhixin, ZHUO Yaqi, ZHANG Canlong, et al. Survey on multi-label learning[J]. Application Research of Computers, 2014, 31(6): 1601-1605.(in Chinese)
[17] TSOUMAKAS G, SPYROMITROS-XIOUFIS E, VILCEK J, et al. Mulan: a java library for multi-label learning[DB/OL]. (2011-07-12). http://mulan.sourceforge.net/datasets.html.
[18] LIN Yaojin, HU Xuegang, WU Xindong. Quality of information-based source assessment and selection[J]. Neurocomputing, 2014, 133: 95-102.
[19] ZHANG Minling, ZHOU Zhihua. ML-kNN: a lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048.
[20] 王晨曦,林耀进,唐莉,等.基于信息粒化的多标记特征选择算法[J].模式识别与人工智能,2017,31(2):123-131.
WANG Chenxi, LIN Yaojin, TANG Li, et al. Multi-label feature selection based on information granulation[J].Pattern Recognition and Artificial Intelligence, 2018, 31(2): 123-131.(in Chinese)
[21] ZHANG Yin, ZHOU Zhihua. Multilabel dimensionality reduction via dependence maximization[J]. ACM Transactions on Knowledge Discovery from Data, 2010, 4(3): 14.
[22] LEE J, KIM D W. Feature selection for multi-label clas-sification using multivariate mutual information[J]. Pattern Recognition Letters, 2013, 34(3): 349-357.
[23] ZHANG Minling, PEA J M, ROBLES V. Feature selection for multi-label naive Bayes classification[J]. Information Sciences, 2009, 179(19): 3218-3229.

相似文献/References:

[1]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
 PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(3):211.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
 LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(3):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
 CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(3):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
 ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(3):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
 Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
 Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
 Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(3):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
 Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(3):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
 Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]王守觉,鲁华祥,陈向东,等.人工神经网络硬件化途径与神经计算机研究[J].深圳大学学报理工版,1997,14(1):8.
 Wang Shoujue,Lu Huaxiang,Chen Xiangdong and Zeng Yujuan.On the Hardware for Artificial Neural Networks and Neurocomputer[J].Journal of Shenzhen University Science and Engineering,1997,14(3):8.

备注/Memo

备注/Memo:
Received:2019-04-13;Accepted:2019-06-03
Foundation:Natural Science Research Funds of Education Department of Anhui Province (KJ2017A352); Key Laboratory of Data Science and Intelligence Application, Fujian Province University (D1801); Anhui Province Key Laboratory of Affective Computing & Advanced Intelligent Machine (ACAIM160102)
Corresponding author:Professor WANG Yibin. E-mail: wangyb07@mail.ustc.edu.cn
Citation:WANG Yibin, WU Chen, CHENG Yusheng, et al. Multi-label feature selection algorithm with imbalance label otherness[J]. Journal of Shenzhen University Science and Engineering, 2020, 37(3): 234-242.(in Chinese)
基金项目:安徽省高校重点科研资助项目(KJ2017A352);福建省高校重点实验室开放课题资助项目(D1801);安徽省高校重点实验室基金资助项目(ACAIM160102)
作者简介:王一宾(1970—),安庆师范大学教授.研究方向:多标记学习、机器学习和软件安全等.E-mail: wangyb07@mail.ustc.edu.cn
引文:王一宾,吴陈,程玉胜,等.不平衡标记差异性多标记特征选择算法[J]. 深圳大学学报理工版,2020,37(3):234-242.
更新日期/Last Update: 2020-05-30