低資源言語音声認識における音響的および音素的類似度に基づくデータサブセット選択
DATA SELECTION BASED ON COMBINATION OF ACOUSTIC AND PHONETIC SIMILARITIES FOR LOW-RESOURCE SPEECH RECOGNITION

概要

Automatic Speech Recognition (ASR) has significantly improved due to pre-trained models, which are first trained on large-scale datasets and then fine-tuned for specific target languages. However, their performance tends to degrade in low-resource languages due to limited training data. In this study, we explore using non-target language data to enhance target low-resource language ASR and propose an effective combination of Spoken Language Identification (SLI) models to measure the similarity of speech utterances to the target low-resource language. Specifically, we combine SLI models based on acoustic and phonetic similarities. Experiments on Indic and some European languages from the Common Voice dataset show that phonetic similarity based on International Phonetic Alphabet (IPA) tokens achieves performance comparable to the conventional method using acoustic similarity in SLI and ASR. Moreover, combining acoustic and phonetic similarities further reduces the character error rate.

研究者

氏名	コース	研究室	役職/学年
Jianan Chen	知能情報学コース	音声メディア研究室	博士3回生
Chenhui Chu	知能情報学コース	言語メディア研究室	准教授
Sheng Li	その他の専攻・大学	Institute of Science Tokyo	助教
Kawahara Tatsuya	知能情報学コース	音声メディア研究室	教授

低資源言語音声認識における音響的および音素的類似度に基づくデータサブセット選択DATA SELECTION BASED ON COMBINATION OF ACOUSTIC AND PHONETIC SIMILARITIES FOR LOW-RESOURCE SPEECH RECOGNITION

概要

研究者

低資源言語音声認識における音響的および音素的類似度に基づくデータサブセット選択
DATA SELECTION BASED ON COMBINATION OF ACOUSTIC AND PHONETIC SIMILARITIES FOR LOW-RESOURCE SPEECH RECOGNITION