Bilingual Dictionary Creation
Bilingual Dictionary Creation for Low-Resource Languages

概要

Over 40 percent of the world’s approximate 7,000 languages are at risk of disappearing. To prevent more languages to be endangered or extinct, enrichment of language resources is crucial. A machine-readable bilingual dictionary is very useful in Information Retrieval and NLP research but scarce for low-resource languages. The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction a difficult task for low-resource languages. To overcome this limitation, we propose a constraint-based bilingual dictionary induction for closely-related languages which combines two input bilingual dictionaries A-B and B-C via a pivot language B to induce a new dictionary A-C. The key feature of this approach is to extract correct translation pairs by identifying cognates based on semantic constraints. Furthermore, to generate bilingual dictionaries among closely-related languages comprehensively, we need to involve humans into the process to create several input dictionaries and evaluate the induced translation pairs. Therefore, plan optimization to minimize the human costs is essential. We adopt the Markov Decision Process (MDP) in formalizing plan optimization for creating bilingual dictionaries to consider uncertainty of performance of the constraint-based bilingual dictionary induction. The MDP-based proposal outperformed heuristic planning on the total cost for all datasets examined.

産業界への展開例・適用分野

The bilingual dictionary creation framework can be used to semi-automatically create multiple bilingual dictionaries for closely-related low-resource language in Indonesia, Japan, or any countries.

研究者

氏名 専攻 研究室 役職/学年
NASUTION ARBI HAZA 社会情報学専攻 Ishida and Matsubara 博士3回生
MURAKAMI YOHEI デザイン学ユニット Center for the Promo 准教授
ISHIDA TORU 社会情報学専攻 Ishida and Matsubara 教授

PAGE TOP