[1]何武超,王晓兰,何玉林,等.基于无放回抽样的帕尔森窗口集成方法[J].深圳大学学报理工版,2018,35(6):617-621.[doi:10.3724/SP.J.1249.2018.06617]
 HE Wuchao,WANG Xiaolan,HE Yulin,et al.Sampling without replacement-based Parzen window ensemble method[J].Journal of Shenzhen University Science and Engineering,2018,35(6):617-621.[doi:10.3724/SP.J.1249.2018.06617]
点击复制

基于无放回抽样的帕尔森窗口集成方法()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第35卷
期数:
2018年第6期
页码:
617-621
栏目:
【电子与信息科学】
出版日期:
2018-11-16

文章信息/Info

Title:
Sampling without replacement-based Parzen window ensemble method
文章编号:
201806010
作者:
何武超1王晓兰1何玉林23熊睿杰2
1)沧州职业技术学院信息工程系,河北沧州 061001
2)深圳大学计算机与软件学院,广东深圳 518060
3)深圳大学大数据系统计算技术国家工程实验室,广东深圳 518060
Author(s):
HE Wuchao1 WANG Xiaolan1 HE Yulin23 and XIONG Ruijie2
1) Department of Information Engineering, Cangzhou Technical College, Cangzhou 061001, Hebei Province, P.R.China
2) College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, Guangdong Province, P.R.China
3) National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen 518060, Guangdong Province, P.R.China
关键词:
概率分布概率密度函数估计帕尔森窗口核密度估计方法窗口宽度无放回抽样集成方法大规模数据集
Keywords:
probability distribution probability density function estimation Parzen window kernel density estimation method bandwidth sampling without replacement ensemble method large-scale dataset
分类号:
TP 311
DOI:
10.3724/SP.J.1249.2018.06617
文献标志码:
A
摘要:
为解决大规模数据集的概率密度函数估计问题,提出一种基于无放回抽样的帕尔森窗口集成(sampling without replacement-based Parzen window ensemble,SR-PWE)方法,该方法在不需要利用全部数据的前提下,能够以较低的计算复杂度获得令人满意的概率密度函数估计表现.基于无放回抽样得到的若干原数据集的数据子集,利用帕尔森窗口法在数据子集上进行基概率密度函数估计,并将抽样上估计的基概率密度函数集成得到原始数据集的概率密度函数.通过在柯西分布和正态分布上对比帕尔森窗口法和SR-PWE方法的概率密度函数估计表现,证实SR-PWE方法可行且有效.
Abstract:
Although the Parzen window method is a classical probability density function (PDF) estimation method, which is widely applied in the fields of machine learning and pattern recognition, it is unsuitable for the PDF estimation of large-scale data because of its high computational complexity and bandwidth sensibility. In this paper, to handle the PDF estimation for large-scale data, we propose a sampling without replacement-based Parzen window ensemble (SR-PWE) method which conducts the PDF estimation based on the partial data and is able to obtain the satisfactory PDF estimation performance with the low computation complexity. Firstly, we generate a number of sub-datasets from the original data set by sampling without replacement. Secondly, we estimate the base PDFs by using the Parzen window method on these sub-datasets. Then, we determine the PDF of original data set based on the fusion of base PDFs. Finally, the experimental results on Cauchy and normal distributions demonstrate the feasibility and effectiveness of sampling without replacement-based Parzen window ensemble method.

参考文献/References:

[1]PARZEN E. On estimation of a probability density function and mode[J]. The Annals of Mathematical Statistics, 1962, 33(3): 1065-1076.
[2]SCOTT D W. Multivariate density estimation: theory, practice, and visualization[M]. Hoboken, USA: John Wiley & Sons, 2015.
[3]XIANG Zhongliang, YU Xiangru, KANG D K. Experimental analysis of nave Bayes classifier based on an attribute weighting framework with smooth kernel density estimations[J]. Applied Intelligence, 2016, 44(3): 611-620.
[4]何玉林.基于核密度估计的光谱数据分类与回归方法研究[D].保定:河北大学,2014.
HE Yulin. Spectral data classification and regression based on kernel density estimation[D]. Baoding: Hebei University, 2014.(in Chinese)
[5]ANDERSON T K. Kernel density estimation and k-means clustering to profile road accident hotspots[J]. Accident: Analysis & Prevention, 2009, 41(3): 359-364.
[6]张婧虹.混合数据的核密度估计熵与快速的贪心特征选择算法[D] .杭州:浙江大学,2017.
ZHANG Jinghong. Kernel density estimation entropy for hybrid data and a fast greedy feature selection algorithm[D]. Hangzhou: Zhejiang University, 2017.(in Chinese)
[7]NANNI L, LUMINI A. Ensemble of Parzen window classifiers for on-line signature verification[J]. Neurocomputing, 2005, 68(5): 217-224.
[8]WANGD M P, JONES M C. Kernel smoothing[M]. Boca Raton, USA: CRC Press, 1994.
[9]SILVERMAN B W. Density estimation for statistics and data analysis[M]. London: Chapman and Hall, 1986.
[10]TERRELL G R. The maximal smoothing principle in density estimation[J]. Journal of the American Statistical Association, 1990, 85(410): 470-477.
[11]ALEXANDRE L A. A solve-the-equation approach for unidimensional data kernel bandwidth selection[R/OL]. [2008-11-29][2008-01-01]. Beira Interior, Portugal: University of Beira Interior. http:// www.di.ubi.pt/~lfbaa/entnetsPubs/bandwidth.pdf.
[12]茹杨.核函数的核密度估计算法[D].哈尔滨: 哈尔滨理工大学, 2016.
RU Yang. Algorithm of kernel density estimation of kernel function[D]. Harbin: Harbin University of Science and Technology, 2016.(in Chinese)
[13]王俊明,茹杨,陈瑜,等.基于余弦核函数在solve-the-equation方法下的核密度估计[J].哈尔滨理工大学学报,2016,21(1):114-117.
WANG Junming, RU Yang, CHEN Yu, et al. Solve-the-equation kernel density estimation method based on cosine kernel function[J]. Journal of Harbin University of Science and Technology, 2016, 21(1): 114-117.(in Chinese)
[14]HORVITZ D G, THOMPSON D J. A generalization of sampling without replacement from a finite universe[J]. Journal of the American Statistical Association, 1952, 47(260): 663-685.
[15]BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140.
[16]MARTINEZ-MUNOZ G, SUAREZ A. Out-of-bag estimation of the optimal sample size in bagging[J]. Pattern Recognition, 2010, 43(1): 143-152.
[17]LUH K, PIPPENGER N. Large-deviation bounds for sampling without replacement[J]. The American Mathematical Monthly, 2014, 121(5): 449-454.
[18]HE Yulin, LIU J N K, WANG Xizhao, et al. Optimal bandwidth selection for re-substitution entropy estimation[J]. Applied Mathematics and Computation, 2012, 219(8): 3425-3460.

备注/Memo

备注/Memo:
Received:2018-03-16;Accepted:2018-08-07
Foundation:National Natural Science Foundation of China (61503252); China Postdoctoral Science Foundation (2016T90799); Scientific Research Foundation of Shenzhen University for Newly-introduced Teachers (2018060); National Key R & D Program of China (2017YFC0822604-2)
Corresponding author:Assistant professor HE Yulin. E-mail: yulinhe@szu.edu.cn
Citation:HE Wuchao, WANG Xiaolan, HE Yulin, et al. Sampling without replacement-based Parzen window ensemble method[J]. Journal of Shenzhen University Science and Engineering, 2018, 35(6): 617-621.(in Chinese)
基金项目:国家自然科学基金资助项目(61503252); 中国博士后科学基金资助项目 (2016T90799); 深圳大学新引进教师科研启动资助项目(2018060); 国家重点研发计划资助项目 (2017YFC0822604-2)
作者简介:何武超 (1980—),女,沧州职业技术学院讲师.研究方向:机器学习与数据挖掘.E-mail:wuchao_he@126.com
引文:何武超,王晓兰,何玉林,等.基于无放回抽样的帕尔森窗口集成方法[J]. 深圳大学学报理工版,2018,35(6):617-621.
更新日期/Last Update: 2018-11-30