[1]杨峻山,纪震,谢维信,等.基于粒子群优化的生物组学数据分类模型选择[J].深圳大学学报理工版,2016,33(3):264-271.[doi:10.3724/SP.J.1249.2016.03264]
 Yang Junshan,Ji Zhen,Xie Weixin,et al.Model selection based on particle swarm optimization for omics data classification[J].Journal of Shenzhen University Science and Engineering,2016,33(3):264-271.[doi:10.3724/SP.J.1249.2016.03264]
点击复制

基于粒子群优化的生物组学数据分类模型选择()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第33卷
期数:
2016年第3期
页码:
264-271
栏目:
电子与信息科学
出版日期:
2016-05-20

文章信息/Info

Title:
Model selection based on particle swarm optimization for omics data classification
文章编号:
201603007
作者:
杨峻山1纪震1谢维信1朱泽轩2
1) 深圳大学信息工程学院,广东深圳 518060
2) 深圳大学计算机与软件学院,广东深圳 518060
Author(s):
Yang Junshan1 Ji Zhen1 Xie Weixin1 and Zhu Zexuan2
1) College of Information Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P.R.China
2) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060,Guangdong Province, P.R.China
关键词:
生物组学粒子群优化样本平衡特征选择分类模型模型选择数据挖掘
Keywords:
omics dataset particle swarm optimization data sampling feature selection classification model model selection data mining
分类号:
TP 181
DOI:
10.3724/SP.J.1249.2016.03264
文献标志码:
A
摘要:
针对生物组学数据普遍存在的高维小样本和样本分布不平衡问题,提出基于粒子群优化分类模型选择算法.该算法中粒子编码由样本平衡模型、特征选择模型和分类模型及超参数构成,粒子种群以达到以生物组学数据最佳分类性能为目标,通过对粒子的速度和位置进行迭代更新,得到模型组合及超参数的最优解.在8组真实生物组学数据集上的实验结果表明,所提模型选择算法能够避免人为选择所带来的主观偏差,提高最佳分类性能和稳定性.
Abstract:
A new model selection algorithm based on particle swarm optimization is proposed for omics data classification. Specifically, the algorithm is designed to handle the high dimensionality, small sample size and class imbalance problems that are inherent in omics data. The particles encode candidate combinations of data sampling, feature selection, classification models and their corresponding parameter settings. The swarm optimization is targeted at the best classification performance. The particle velocity and position are iteratively updated until some stopping criteria are met and then the optimal solution model combination is output. The simulation results on eight real-world omics datasets show that the proposed model selection algorithm is capable of avoiding the bias introduced by manual settings and leading to accurate and reliable classification performance.

参考文献/References:

[1] Marchionni L, Geman D. Predicting cancer phenotypes with mechanism-driven multi-omics data integration[J]. Cancer Research, 2015, 75(15): 261-274.
[2] Swan A L, Stekel D J, Hodgman C, et al. A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data[J]. BMC Genomics, 2015, 16(s1): S2.
[3] Triguero I, Rio S, Lopez V, et al. ROSEFW-RF: The winner algorithm for the ECBDL’14 big data competition: an extremely imbalanced big data bioinformatics problem[J]. Knowledge-Based Systems, 2015, 87: 69-79.
[4] Yao F, Coquery J, Lê Cao K A. Independent principal component analysis for biologically meaningful dimension reduction of large biological data sets[J]. BMC Bioinformatics, 2012, 13(1): 24.
[5] Ambroise C, McLachlan G. Selection bias in gene extraction on the basis of microarray gene-expression data[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(10):6562-6566.
[6] Christin C, Hoefsloot H C J, Smilde A K, et al. A critical assessment of feature selection methods for biomarker discovery in clinical proteomics[J]. Molecular & Cellular Proteomics, 2013, 12(1): 263-276.
[7] 薛丽萍,尹俊勋,纪震.基于粒子群优化-模糊聚类的说话人识别[J].深圳大学学报理工版,2008,25(2):178-183.
Xue Liping, Yin Junxun, Ji Zhen. Speaker recognition based on particle swarm optimization and fuzzy clustering analysis[J]. Journal of Shenzhen University Science and Engineering, 2008, 25(2): 178-183.(in Chinese)
[8] 曾磐,朱安民.基于支持向量机的NBA季后赛预测方法[J].深圳大学学报理工版,2016,33(1):62-71.
Zeng Pan, Zhu Anmin. A SVM-based model for NBA playoffs prediction[J]. Journal of Shenzhen University Science and Engineering, 2016, 33(1): 62-71.(in Chinese)
[9] Weiss G M, Provost F. The effect of class distribution on classifier learning: an empirical study: University Technical Report ML-TR-44[R]. Piscataway, USA: Rutgers, 2001.
[10] Khoshgoftaar T M, Fazelpour A, Dittman D J, et al. Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data[C]// IEEE 15th International Conference on Information Reuse and Integration. Redwood City, USA: IEEE, 2014: 315-321.
[11] Chawla N V, Bowyer K W, Hall L O, et al. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16(1): 321-357.
[12] Han Hui, Wang Wenyuan, Mao Binghuan. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning[C]// International Conference on Intelligent Computing. Berlin: Springer Berlin Heidelberg, 2005: 878-887.
[13] Barua S, Islam M M, Yao X, et al. MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2): 405-425.
[14] Saeys Y, Inza I, Larraaga P. A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23(19): 2507-2517.
[15] Lazar C, Taminau J, Meganck S, et al. A survey on filter techniques for feature selection in gene expression microarray analysis[J]. IEEE Transactions on Computational Biology and Bioinformatics, 2012, 9(4): 1106-1119.
[16] Ding C, Peng Hanchuan. Minimum redundancy feature selection from microarray gene expression data[J]. Journal of Bioinformatics and Computational Biology, 2005, 3(2): 185-205.
[17] Yu Lei, Liu Huan. Feature selection for high-dimensional data: a fast correlation-based filter solution[C]// Proceedings of the 20th International Conference on Machine Leaning. Washington D C, USA:[s. n.], 2003, 2: 856-863.
[18] Momma M, Bennett K P. A pattern search method for model selection of support vector regression[C]// Proceedings of the 2nd SIAM International Conference on Data Mining. Arlington, USA:[s. n.], 2002: 261-274.
[19] Escalante H J, Montes M, Sucar L E. Particle swarm model selection[J]. The Journal of Machine Learning Research, 2009, 10: 405-440.
[20] Rosales-Pérez A, Gonzalez J A, Coello C A C, et al. Multi-objective model type selection[J]. Neurocomputing, 2014, 146: 83-94.
[21] Ambroise C, McLachlan G J. Selection bias in gene extraction on the basis of microarray gene-expression data[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(10): 6562-6566.
[22] Zhou Jiarui, Zhu Zexuan, Ji Zhen. A memetic algorithm based feature weighting for metabolomics data classification[J]. Acta Electronica Sinica, 2014, 23(4): 706-711.
[23] He Shan, Chen Huanhuan, Zhu Zexuan, et al. Robust twin boosting for feature selection from high-dimensional omics data with label noise[J]. Information Sciences, 2015, 291: 1-18.
[24] Yukinawa N, Oba S, Kato K, et al. A multi-class predictor based on a probabilistic model: application to gene expression profiling-based diagnosis of thyroid tumors[J]. BMC Genomics. 2006, 7(1):190.
[25] Yeoh E J, Ross M E, Shurtleff S A, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling[J]. Cancer Cell, 2002, 1(2): 133-143.
[26] Bhattacharjee A, Richards W G, Staunton J, et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses[J]. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(24):13790-13795.
[27] Ramaswamy S, Tamayo P, Rifkin R, et al. Multiclass cancer diagnosis using tumor gene expression signatures[J]. Proceedings of the National Academy of Sciences of the United States of America, 2001, 98(26): 15149-15154.

相似文献/References:

[1]薛丽萍,尹俊勋,纪震.基于粒子群优化-模糊聚类的说话人识别[J].深圳大学学报理工版,2008,25(2):178.
 XUE Li-ping,YIN Jun-xun,and JI Zhen.Speaker recognition based on particle swarm optimizition and fuzzy clustering analysis[J].Journal of Shenzhen University Science and Engineering,2008,25(3):178.

备注/Memo

备注/Memo:
Received:2016-02-26;Accepted:2016-04-10
Foundation:National Natural Science Foundation of China (61171125, 61471246)
Corresponding author:Professor Ji Zhen.E-mail:jizhen@szu.edu.cn
Citation:Yang Junshan, Ji Zhen, Xie Weixin, et al. Model selection based on particle swarm optimization for omics data classification[J]. Journal of Shenzhen University Science and Engineering, 2016, 33(3): 264-271.(in Chinese)
基金项目:国家自然科学基金资助项目(61171125, 61471246)
作者简介:杨峻山(1981—),男,深圳大学博士研究生.研究方向:信号与信息处理.E-mail:junshan763@126.com
引文:杨峻山,纪震,谢维信,等.基于粒子群优化的生物组学数据分类模型选择[J]. 深圳大学学报理工版,2016,33(3):264-271.
更新日期/Last Update: 2016-05-08