[1]徐晓梦,谭振华,李欣书.基于小波包全频分解的耐噪声纹识别算法[J].深圳大学学报理工版,2020,37(增刊1):84-91.[doi:10.3724/SP.J.1249.2020.99084]
 XU Xiaomeng,TAN Zhenhua,and LI Xinshu.Noise-resistant speaker recognition algorithm based on full-frequency speech features with wavelet packet[J].Journal of Shenzhen University Science and Engineering,2020,37(增刊1):84-91.[doi:10.3724/SP.J.1249.2020.99084]
点击复制

基于小波包全频分解的耐噪声纹识别算法()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第37卷
期数:
2020年增刊1
页码:
84-91
栏目:
网络空间安全
出版日期:
2020-11-20

文章信息/Info

Title:
Noise-resistant speaker recognition algorithm based on full-frequency speech features with wavelet packet
文章编号:
202099016
作者:
徐晓梦谭振华李欣书
东北大学软件学院,辽宁沈阳 110819
Author(s):
XU Xiaomeng TAN Zhenhua and LI Xinshu
Software College, Northeastern University, Shenyang 110819, Liaoning Province, P.R.China
关键词:
生物信息识别说话人识别小波包卷积神经网络
Keywords:
biometric identification speaker recognition wavelet packet convolutional neural network
分类号:
TN915.08
DOI:
10.3724/SP.J.1249.2020.99084
文献标志码:
A
摘要:
目前多数说话人识别算法均在干净环境下进行,在噪声环境下的效果较差.为提升噪声环境下说话人识别的正确率,提出一种新的特征提取方法与识别模型WPGT.利用小波包分解高频和低频信号,Gammatone滤波器组模拟人耳听觉系统处理非线性信号,从而提取更完备的说话人语音特征,采用卷积神经网络对特征进行训练并完成说话人识别.基于开源语音数据集、噪声融合数据集,将本研究方案与常用的声纹特征提取方法MFCC和Gammatone进行对比. 实验结果表明,在噪声环境下,本研究所提WPGT方法的声纹识别精度相较于MFCC和Gammatone分别提升10.63%和16.91%,具有更好的抗噪声能力.
Abstract:
At present, most speaker recognition algorithms are performed in a clean environment, and the effect is poor in a noisy environment. In order to improve the accuracy of speaker recognition in a noisy environment, a new feature extraction method, wavelet packet & Gammatone (WPGT) based model, is proposed. In this model, the wavelet packet is used to decompose high-frequency and low-frequency signals and the Gammatone filter bank simulates the human auditory system to process non-linear signals so that more complete speaker voice features are extracted, and finally, the convolutional neural network is used to train the features and complete speaker recognition. Based on the open source speech data sets and the noise fusion data sets, the proposed method is compared with the commonly used voiceprint feature extraction methods MFCC and Gammatone. The experimental results show that, in a noisy environment, WPGT has better anti-noise ability than MFCC and Gammatone. Compared with MFCC and Gammatone, the accuracy of WPGT is improved by 10.63% and 16.91% , respectively.

参考文献/References:

[1] DAVIS S B, MERMELSTEIN P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980, 28(4): 357-366.
[2] HUANG X, ACERO A, HON H. Spoken language processing: a guide to theory, algorithm, and system development[M]. New Jersey, USA: Prentice Hall, 2001.
[3] TORFI A, DAWSON J, NASRABADI N M. Text-independent speaker verification using 3D convolutional neural networks[C]// Proceeding of ICME 2018. San Diego, USA: IEEE, 2018: 1-5.
[4] RAVANELLI M, BENGIO Y. Speaker recognition from raw waveform with SincNet[C]// Proceeding of IEEE Spoken Language Technology Workshop (SLT). Athens, IEEE, 2018: 1-8.
[5] LIN Ting, ZHANG Ye. Speaker recognition based on long-term acoustic features with analysis sparse representation[J]. IEEE Access, 2019, 7: 87439-87447.
[6] CORDEIRO H, RIBEIRO C M. Speaker characterization with MLSFs[C]// Proceeding of Speaker and Language Recognition Workshop. San Juan: IEEE, 2006: 1-4.
[7] POUR A F, ASGARI M, HASANABADI M R. Gammatonegram based speaker identification[C]// Proceeding of International Conference on Computer & Knowledge Engineering. Mashhad, Iran: IEEE, 2014: 52-55.
[8] ZHANG Yadong, SUN Fuyuan. A methodology based on wavelet packet for speaker transform recognition[C]// Proceeding of International Conference on Wavelet Analysis and Pattern Recognition. Beijing: IEEE, 2007: 767-771.
[9] KINNUNEN T, LI H. An overview of text-independent speaker recognition: from features to supervectors[J]. Speech Communication, 2010, 52(1): 12-40.
[10] BURTON D K. Text-dependent speaker verification using vector quantization source coding[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1987, 35(2): 133-143.
[11] FURUI S. Cepstral analysis technique for automatic speaker verification[J]. IEEE Transactions on Acoustics Speech, and Signal Processing, 1981, 29(2): 254-272.
[12] REYNOLDS D A, ROSE R C. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing,1995, 3(1): 72-83.
[13] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models[J]. Digital Signal Processing, 2000, 10(1/2/3): 19-41.
[14] BENZEGHIBA M F, HERV B. User-customized password speaker verification using multiple reference and background models[J]. Speech Communication, 2004, 48(9): 1200-1213.
[15] MOHAMMADI M, MOHAMMADI H R S. Weighted I-vector based text-independent speaker verification system[C]// Proceeding of the 27th Iranian Conference on Electrical Engineering (ICEE). Yazd, Iran: IEEE, 2019: 1647-1653.
[16] VARIANI E, LEI X, MCDERMOTT E, et al. Deep neural networks for small footprint text-dependent speaker verification[C]// Proceeding of IEEE International Conference on Acoustics. Florence, Italy: IEEE, 2014: 4052-4056.
[17] LI Chao, MA Xiaokong, JIANG Bing, et al. Deep speaker: an end-to-end neural speaker embedding system[EB/OL]. arXiv, 2017[2020-09-15]. https://arxiv.org/abs/1705.02304.
[18] HEIGOLD G, MORENO I, BENGIO S, et al. End-to-end text-dependent speaker verification[C]// Proceeding of ICASSP. Shanghai, China: IEEE, 2016: 5115-5119.
[19] VRIES N J, DAVEL M H, BADENHORST J, et al. A smartphone-based ASR data collection tool for under-resourced languages[J]. Speech Communication, 2014, 56: 119-131.
[20] MOKGONYANE T B, SEFARA T J, MODIPA T I, et al. Automatic speaker recognition system based on machine learning algorithms[C]// 2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA). Bloemfontein, South Africa: IEEE, 2019: 141-146.

相似文献/References:

[1]薛丽萍,尹俊勋,纪震.基于粒子群优化-模糊聚类的说话人识别[J].深圳大学学报理工版,2008,25(2):178.
 XUE Li-ping,YIN Jun-xun,and JI Zhen.Speaker recognition based on particle swarm optimizition and fuzzy clustering analysis[J].Journal of Shenzhen University Science and Engineering,2008,25(增刊1):178.
[2]解焱陆,张劲松,刘明辉,等.基于分层增长语音活动检测的鲁棒性说话人识别[J].深圳大学学报理工版,2012,29(No.4(283-376)):328.[doi:10.3724/SP.J.1249.2012.04328]
 XIE Yan-lu,ZHANG Jing-song,LIU Ming-hui,et al.Robust speaker recognition based on level-building voice activity detection[J].Journal of Shenzhen University Science and Engineering,2012,29(增刊1):328.[doi:10.3724/SP.J.1249.2012.04328]

备注/Memo

备注/Memo:
Received:2020-10-09
Foundation:National Key Research and Development Program of China (2019YFB1405803); CERNET Innovation Project (NGII20190609)
Corresponding author:Professor TAN Zhenhua.E-mail: tanzh@mail.neu.edu.cn
Citation:XU Xiaomeng,TAN Zhenhua,LI Xinshu. Noise-resistant speaker recognition algorithm based on full-frequency speech features with wavelet packet[J]. Journal of Shenzhen University Science and Engineering, 2020, 37(Suppl.1): 84-91.(in Chinese)
基金项目:国家重点研发计划资助项目(2019YFB1405803);下一代互联网技术创新计划资助项目(NGII20190609)
作者简介:徐晓梦(1996—),东北大学硕士研究生.研究方向:生物特征认证.E-mail:XiaomengXu_edu@163.com
引文:徐晓梦, 谭振华, 李欣书.基于小波包全频分解的耐噪声纹识别算法[J]. 深圳大学学报理工版,2020,37(增刊1):84-91.
更新日期/Last Update: 2020-11-26