Second International Workshop on Symbolic-Neural Learning (SNL-2018)

July 5-6, 2018
Nagoya Congress Center (Nagoya, Japan)

Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Model

Takashi Wada (Nara Institute of Science and Technology)

Abstract:

We propose an unsupervised method to obtain cross-lingual embeddings without any parallel data or pre-trained word embeddings. The proposed model, which we call "multilingual neural language models", takes sentences of multiple languages as an input. The proposed model contains bidirectional LSTMs that perform as forward and backward language models, and these networks are shared among all the languages. The other parameters, i.e. word embeddings and linear transformation between hidden states and outputs, are specific to each language. The shared LSTMs can capture the common sentence structure among all languages. Accordingly, word embeddings of each language are mapped into a common latent space, making it possible to measure the similarity of words across multiple languages. We evaluate the quality of the cross-lingual word embeddings on a word alignment task. The experimental results show that our model can successfully obtain cross-lingual embeddings under low-resource scenarios, in which an existing unsupervised method can not perform well. Furthermore, our model enables to map word embeddings of four languages into a common space, generating what we call quadrilingual word embeddings in an unsupervised way.