Waveform-domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition
Waveform-domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition

概要

While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this work, we propose a waveform-spectrogram hybrid system (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature maps by spectrogram encoding in each encoder layer. Incorporating spectral information can provide robust human hearing experience performance. However, WaveSpecEnc has a minor automatic speech recognition (ASR) improvement. Thus, we improve the WaveSpecEnc for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+). WaveSpecEnc+ supplements encoding information to both the SE front-end and ASR back-end.

産業界への展開例・適用分野

speech and audio processing

研究者

氏名 コース 研究室 役職/学年
Hao SHI 知能情報学コース Speech and Audio Processing Lab. 博士3回生
Tatsuya Kawahara 知能情報学コース Speech and Audio Processing Lab. 教授