不平衡标记差异性多标记特征选择算法

1)安庆师范大学计算机与信息院,安徽安庆 246133; 2)安徽省高校智能感知与计算重点实验室,安徽安庆 246133

人工智能; 多标记学习; 特征选择; 不平衡数据; 标记相关性; 信息熵; 标记差异性

Multi-label feature selection algorithm with imbalance label otherness
WANG Yibin1, 2, WU Chen1, CHENG Yusheng1, 2, and JIANG Jiansheng1, 2

1)School of Computer and Information, Anqing Normal University, Anqing 246133, Anhui Province, P.R.China 2)The University Key Laboratory of Intelligent Perception and Computing of Anhui Province, Anqing 246133, Anhui Province, P.R.China

artificial intelligence; multi-label learning; feature selection; imbalanced data; label correlation; information entropy; label otherness

DOI: 10.3724/SP.J.1249.2020.03234

备注

针对现有的特征选择算法大多未考虑不同标记对样本的描述程度可能存在差异的问题,提出一种不平衡标记差异性多标记特征选择算法(multi-label feature selection algorithm with imbalance label otherness, MSIO),将不同标记下正负标记的频率分布作为该标记的权值加入到特征选择的过程中,并修正传统的信息熵计算方法,从而得到一组更高效的特征序列.以多标记k近邻(multi-label k-nearest neighbor, ML-kNN)为基础分类器,在Mulan数据库的11个多标记基准数据集上,对基于最大相关性的多标记维数约简(multi-label dimensionality reduction via dependence maximization, MDDM)算法、基于多变量互信息的多标记特征选择算法PMU(pairwise multivariate mutual information)、多标记朴素贝叶斯分类的特征选择(feature selection for multi-label naive Bayes classification, MLNB)算法、基于标记相关性的多标记特征选择(multi-label feature selection with label correlation, MUCO)算法和MSIO算法进行评价,实验结果和统计假设检验说明,MSIO算法稳定性佳且分类精度高,具有一定的有效性和优越性.

In view of the fact that most of the existing feature selection algorithms do not consider the possible differences existing in the sample description by different labels, a multi-label feature selection algorithm with imbalance label otherness(MSIO)is proposed. The frequency distributions of positive and negative labels under different labels are added to the process of feature selection as the label weight, the traditional method of calculating information entropy is modified to get a more efficient feature sequence. Based on ML-kNN(multi-label k-nearest neighbor), the features are classified on 11 multi-label benchmark datasets of Mulan database, and the algorithms of multi-label dimensionality reduction via dependency maximization(MDDM), pairwise multivariate mutual information(PMU), feature selection for multi-label naive Bayes classification(MLNB), multi-label feature selection with label correlation(MUCO)and MSIO algorithm are evaluated. Experimental results and statistical hypothesis tests show that MSIO algorithm has good stability, high classification accuracy, and certain effectiveness and superiority.

·