[1]解焱陆,张劲松,刘明辉,等.基于分层增长语音活动检测的鲁棒性说话人识别[J].深圳大学学报理工版,2012,29(No.4(283-376)):328-334.[doi:10.3724/SP.J.1249.2012.04328]
 XIE Yan-lu,ZHANG Jing-song,LIU Ming-hui,et al.Robust speaker recognition based on level-building voice activity detection[J].Journal of Shenzhen University Science and Engineering,2012,29(No.4(283-376)):328-334.[doi:10.3724/SP.J.1249.2012.04328]
点击复制

基于分层增长语音活动检测的鲁棒性说话人识别()
分享到:

《深圳大学学报理工版》[ISSN:1000-2618/CN:44-1401/N]

卷:
第29卷
期数:
2012年No.4(283-376)
页码:
328-334
栏目:
电子与信息科学
出版日期:
2012-07-25

文章信息/Info

Title:
Robust speaker recognition based on level-building voice activity detection
文章编号:
20120409
作者:
解焱陆1 张劲松1刘明辉2黄中伟2
1) 北京语言大学信息科学学院,北京 100083;2) 深圳大学语音实验室,深圳 518060
Author(s):
XIE Yan-lu1 ZHANG Jing-song1 LIU Ming-hui2 and HUANG Zhong-wei2
1) College of Information Science, Beijing Language and Culture University, Beijing 100083, P.R.China
2) Phonetic Laboratory, Shenzhen University, Shenzhen 518060, P.R.China
关键词:
语音信号处理说话人识别分布式语音识别分层增长语音活动检测似然距离
Keywords:
speech signal processing speaker identification distributed speech recognition level-building voice activity detection likelihood measurement
分类号:
TN 912.34; TP 391.4
DOI:
10.3724/SP.J.1249.2012.04328
文献标志码:
A
摘要:
基于欧洲电信标准化协会颁布的分布式语音识别和前端标准(ETSI-DSR-AFE).针对分布式说话人识别噪声鲁棒性较差的问题,提出一种新的前端处理方法.该方法以似然距离为测度,对语音进行无监督聚类,为减少计算量,采用分层增长(level-building)方法进行逐层分割,从而准确找出语音和静音的边界点.实验结果表明,用该方法改进ETSI-DSR-AFE 标准后,信噪比在大于0 dB时,说话人辨认系统识别率相对改进了18.9%,相对原有的Mel频率倒谱系数(Mel-frequenly Ceptral coefficients,MFCC)系统识别率改进了60.7%.
Abstract:
A level-building and two-stage Wiener filter methodology is proposed to improve the robustness in distributed noise speech recognition in ETSI(European Telecommunications Standards Institute)-DSR(Distributed Speech Recognition)-AFE(Advanced Front-End)standard. The speech is clustered in an unsupervised with a likelihood measurement.The level-building process for dividing speech at each level is introduced to reduce the computational load. Therefore, the boundaries of voice and non-voice data are precisely detected.Experiments have demonstrated that performance of this proposed methodology shows improvement by 18.9% in ETSI-DSR-AFE standard when the SNR of speech is greater than 0 dB.The recognition rate is also improved by 60.7% in comparison with that of Mel-frequenly Ceptral coefficients(MFCC) system.

参考文献/References:

[1] ETSI ES 202 050 V1.1.5.Speech Processing, Transmission and Quality Aspects (STQ);Distributed speech recognition;Advanced front-end feature extraction algorithm;Compression Algorithms.Sophia Antipolis Cedex-FRANCE[S].
[2] Gales M J F.Model-Based Techniques Fornoise Robust Speech Recognition[D].Cambridge: Dissertation University of Cambridge,1995.
[3] Reynolds D A.Channel robust speaker verification via feature mapping[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Hong Kong: IEEE民 Press,2003,2:53-56.
[4] ZHANG Xiang,WANG Hai-peng,XIAO Xiang,et al.Maximum a posteriori linear regression for speaker recognition[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Dallas(USA):IEEE Press,2010:4542-4545.
[5] Kim D K,Gales M J F.Noisy constrained maximum- likelihood linear regression for noise-robust speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing,2011,19(2):315- 325.
[6] LU Yong,WU Zheng-yang.Maximum likelihood polynomial regression for robust speech recognition[J] ACTA Acustica,2010,35(1):88-96.(in Chinese)
吕勇,吴镇扬.基于最大似然多项式回归的鲁棒语音识别[J]. 声学学报,2010,35(1):88-96.
[7] Garcia A A,Mammone R J.Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Arizona(USA): IEEE,1999:325-328.
[8] Sturim D,Campbell W,Dehak N,et al.The MIT LL 2010 speaker recognition evaluation system:scalable language-independent speaker recognition [C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Prague: IEEE Press,2011:5272-5275.
[9] McLaren M,Van Leeuwen D. Source-normalised-and- weighted LDA for robust speaker recognition using i-vectors[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Prague:IEEE Press,2011:5456-5459.
[10] ZHANG Wei-qiang,LIU Jia.An equalized heteroscedastic linear discriminant analysis algorithm[J].IEEE Signal Processing Letters,2008,15:585-588.
[11] Islam M R,Rahman M F,Khan M A G.Improvement of speech enhancement techniques for robust speaker identification in noise[C]// The 12th International Conference on Computers and Information Technology.Dhaka:IEEE, 2009:255-260.
[12] CAI Yu,YUAN Jian-ping,HOU Chao-huan.Harmonic enhancement of speech signal using comb filtering[J] Chinese Journal of Scientific Instrument,2010,31(1):26-31.(in Chinese)
蔡宇,原建平,侯朝焕.基于两级梳状滤波的语音谐波增强[J].仪器仪表学报,2010,31(1):26-31.
[13] ETSI TS 126 243 V10.0.0.Digital Cellular Telecommunications System (phase 2+);Universal Mobile Telecommunications System (UMTS);LTE;ANSI C Code for the Fixed-point Distributed Speech Recognition Extended Advanced Front-end (3GPP TS 26.243 Version 10.0.0 Release 10)[S] .
[14] Dusan Macho, Yan Ming Cheng. SNR-dependent waveform processing for improving the robustness of ASR front-end[C]// Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing.Utah(USA): IEEE, 2001,1:305-308.
[15] Ghosh P K, Tsiartas A, Narayanan S. Robust voice activity detection using long-tterm signal variability[J] IEEE Trans SAP, 2011, 19(3):600-613.
[16] XIE Yan-lu,LIU Ming-hui,YAO Zhi-qiang,et al.Improved two-stage wiener filter for robust speaker identification[C]// The 18th International Conference on Pattern Recognition.Hong Kong:IEEE, 2006,4:310-313.
[17] ReynoldsDA, RoseRC. Robust text-independent speaker identification using Gaussian mixture speaker models[J]. IEEE Transactions on Speech and Audio Processing,1995,32(1):72-83.

相似文献/References:

[1]徐 明,陈知困,黄云森.基于FFT-ACF和候选值估计的基音周期提取方法[J].深圳大学学报理工版,2007,24(4):388.
 XU Ming,CHEN Zhi-kun,and HUANG Yun-sen.A novel pitch tracking method based on FFT-ACF and estimation of pitch candidates[J].Journal of Shenzhen University Science and Engineering,2007,24(No.4(283-376)):388.
[2]薛丽萍,尹俊勋,纪震.基于粒子群优化-模糊聚类的说话人识别[J].深圳大学学报理工版,2008,25(2):178.
 XUE Li-ping,YIN Jun-xun,and JI Zhen.Speaker recognition based on particle swarm optimizition and fuzzy clustering analysis[J].Journal of Shenzhen University Science and Engineering,2008,25(No.4(283-376)):178.

备注/Memo

备注/Memo:
基金项目:国家自然科学基金项目(61005020);中央高校基本科研业务费专项资金资助项目(10JBT01)
作者简介:解焱陆 (1980-),男(汉族),江苏省扬州市人,北京语言大学副教授、博士. E-mail: xieyanlu@blcu.edu.cn
引文:解焱陆, 张劲松,刘明辉,等.基于分层增长语音活动检测的鲁棒性说话人识别[J]. 深圳大学学报理工版,2012,29(4):328-334.
Received:2011-12-12;Revised:2012-05-08;Accepted:2012-06-04
Foundation:National Natural Science Foundation of China(61005020);Fundamental Research Funds for the Central University (10JBT01)
Corresponding author:Associate professor XIE Yan-lu. E-mail: xieyanlu@blcu.edu.cn
Citation:XIE Yan-lu,ZHANG Jing-song,LIU Ming-hui,et al.Robust speaker recognition based on level-building voice activity detection[J]. Journal of Shenzhen University Science and Engineering, 2012, 29(4): 328-334.(in Chinese)
更新日期/Last Update: 2012-07-29