A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

O. ZENNAKI; N. SEMMAR; L. BESACIER

doi:10.1017/S1351324918000293

A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

Published online by Cambridge University Press: 06 August 2018

O. ZENNAKI ,

N. SEMMAR and

L. BESACIER

Show author details

O. ZENNAKI: Affiliation:
CEA, LIST, Vision and Content Engineering Laboratory, Gif-sur-Yvette, France e-mails: othman.zennaki@cea.fr, nasredine.semmar@cea.fr Laboratory of Informatics of Grenoble, Univ. Grenoble-Alpes, Grenoble, France e-mail: laurent.besacier@imag.fr
N. SEMMAR: Affiliation:
CEA, LIST, Vision and Content Engineering Laboratory, Gif-sur-Yvette, France e-mails: othman.zennaki@cea.fr, nasredine.semmar@cea.fr
L. BESACIER: Affiliation:
Laboratory of Informatics of Grenoble, Univ. Grenoble-Alpes, Grenoble, France e-mail: laurent.besacier@imag.fr

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This work focuses on the rapid development of linguistic annotation tools for low-resource languages (languages that have no labeled training data). We experiment with several cross-lingual annotation projection methods using recurrent neural networks (RNN) models. The distinctive feature of our approach is that our multilingual word representation requires only a parallel corpus between source and target languages. More precisely, our approach has the following characteristics: (a) it does not use word alignment information, (b) it does not assume any knowledge about target languages (one requirement is that the two languages (source and target) are not too syntactically divergent), which makes it applicable to a wide range of low-resource languages, (c) it provides authentic multilingual taggers (one tagger for N languages). We investigate both uni and bidirectional RNN models and propose a method to include external information (for instance, low-level information from part-of-speech tags) in the RNN to train higher level taggers (for instance, Super Sense taggers). We demonstrate the validity and genericity of our model by using parallel corpora (obtained by manual or automatic translation). Our experiments are conducted to induce cross-lingual part-of-speech and Super Sense taggers. We also use our approach in a weakly supervised context, and it shows an excellent potential for very low-resource settings (less than 1k training utterances).

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 1 , January 2019 , pp. 43 - 67

DOI: https://doi.org/10.1017/S1351324918000293 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al-Rfou, R., Perozzi, B., and Skiena, S. 2013. Polyglot: distributed word representations for multilingual nlp. In Proceedings of the SIGNLL Conference on Computational Natural Language Learning, pp. 183–192.Google Scholar

Annesi, P., and Basili, R. 2010. Cross-lingual alignment of FrameNet annotations through Hidden Markov Models. In Proceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Berlin, Heidelberg, pp. 12–25.Google Scholar

Aufrant, L., Wisniewski, G., and Yvon, F. 2016. Zero-resource dependency parsing: boosting delexicalized cross-lingual transfer with linguistic knowledge. In Proceedings of the 26th International Conference on Computational Linguistics, pp. 119–130.Google Scholar

Bengio, Y., Ducharme, R., Vincent, P., and Jauvin, C. 2003. A neural probabilistic language model. Journal of Machine Learning Research 3, 1137–1155.Google Scholar

Bengio, Y., Schwenk, H., Senécal, J.-S., Morin, F., and Gauvain, J.-L. 2006. Neural probabilistic language models. In Dawn E, H.. and C, J. Lakhmi. (eds.), Innovations in Machine Learning, pp. 137–186. Berlin, Heidelberg: Springer.Google Scholar

Bentivogli, L., Forner, P., and Pianta, E. 2004. Evaluating cross-language annotation transfer in the multisemcor corpus. In Proceedings of the 20th International Conference on Computational Linguistics, Association for Computational Linguistics, pp. 364–371.Google Scholar

Bérard, A., Servan, C., Pietquin, O, and Besacier, L. 2016. MultiVec: a multilingual and multilevel representation learning toolkit for NLP. In Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, pp. 4188–4192.Google Scholar

Besacier, L., Barnard, E., Karpov, A., and Schultz, T., 2014. Automatic speech recognition for under-resourced languages: a survey. Speech Communication 56: 85–100.Google Scholar

Besacier, L., Lecouteux, B., Azouzi, M., and Luong, N.-Q. 2012. The LIG English to French machine translation system for IWSLT 2012. In Proceedings of the 9th International Workshop on Spoken Language Translation, pp. 102–108.Google Scholar

Brants, T. 2000. TnT: a statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing, Association for Computational Linguistics, pp. 224–231.Google Scholar

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L., 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19: 263–311.Google Scholar

Buchholz, S., and Marsi, E. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the 10th Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 149–164.Google Scholar

Cho, K., van Merriënboer, B., Bahdanau, D., and Bengio, Y. 2014. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of the Syntax, Semantics and Structure in Statistical Translation, pp. 103–111.Google Scholar

Ciaramita, M., and Altun, Y. 2006. Broad-coverage sense disambiguation and information extraction with a supersense sequence tagger. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 594–602.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P., 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12: 2493–2537.Google Scholar

Das, D., and Petrov, S., 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, Association for Computational Linguistics, pp. 600–609.Google Scholar

Duong, L., Cook, P., Bird, S., and Pecina, P. 2013. Simpler unsupervised POS tagging with bilingual projections. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 634–639.Google Scholar

Durrett, G., Pauls, A., and Klein, D. 2012. Syntactic transfer using a bilingual lexicon. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 1–11.Google Scholar

Elman, J. L., 1990. Finding structure in time. Cognitive science 14: 179–211.Google Scholar

Fellbaum, C., 1998. WordNet. Wiley Online Library, Cambridge, MA: MIT Press.Google Scholar

Fraser, A., and Marcu, D., 2007. Measuring word alignment quality for statistical machine translation. Computational Linguistics 33: 293–303.Google Scholar

Garside, R., Leech, G. N., and McEnery, T. 1997. Corpus Annotation: Linguistic Information from Computer Text Corpora. Taylor & Francis, Abingdon.Google Scholar

Gouws, S., and Søgaard, A. 2015. Simple task-specific bilingual word embeddings. In Proceedings of the 14th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1386–1390.Google Scholar

Gouws, S., Bengio, Y., and Corrado, G. 2015. BilBOWA: fast bilingual distributed representations without word alignments. In Proceedings of the 32nd International Conference on Machine Learning, pp. 748–756.Google Scholar

Graves, A. 2012. Supervised sequence labelling. In Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Berlin, Heidelberg: Springer.Google Scholar

Gutiérrez Vázquez, Y., Fernández Orquín, A., Montoyo Guijarro, A., Vázquez Pérez, S. 2011. Enriching the Integration of Semantic Resources Based on Wordnet. Sociedad Española para el Procesamiento del Lenguaje Natural, 47: 249–257, Huelva, Spain.Google Scholar

Henderson, J. 2004. Discriminative training of a neural network statistical parser. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 95–102.Google Scholar

Jiang, W., Liu, Q., and Lü, Y. 2011. Relaxed cross-lingual projection of constituent syntax. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1192–1201.Google Scholar

Jiang, W., Lü, Y., Huang, L., and Liu, Q., 2015. Automatic adaptation of annotations. Computational Linguistics Journal 41: 119–147.Google Scholar

Kim, S., Toutanova, K., and Yu, H. 2012. Multilingual named entity recognition using parallel data and metadata from wikipedia. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 694–702.Google Scholar

Koehn, P., 2005. Europarl: a parallel corpus for statistical machine translation. MT Summit 5: 79–86.Google Scholar

Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., and Zens, R., Dyer, C., Bojar, O., Constantin, A., and Herbst, E. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Association for Computational Linguistics, pp. 177–180.Google Scholar

Kucera, H., and Francis, W. 1979. A Standard Corpus of Present-Day Edited American English, for Use with Digital Computers (Revised and amplified from 1967 version). Providence, RI: Brown University Press.Google Scholar

Li, S., Graça, J. V., and Taskar, B. 2012. Wiki-ly supervised part-of-speech tagging. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 1389–1398.Google Scholar

Luong, T., Pham, H., and Manning, C. D. 2015. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 151–159.Google Scholar

Manion, S. L., and Sainudiin, R. 2013. DAEBAK!: peripheral diversity for multilingual word sense disambiguation. In Proceedings of SemEval, pp. 250–254.Google Scholar

Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., and Khudanpur, S. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 1045–1048.Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Advances in Neural Information Processing Systems, pp. 3111–3119.Google Scholar

Miller, G. A., Leacock, C., Tengi, R., and Bunker, R. T. 1993. A semantic concordance. In Proceedings of the Workshop on Human Language Technology, Association for Computational Linguistics, pp. 303–308.Google Scholar

Nasiruddin, M., Tchechmedjiev, A., Blanchon, H., and Schwab, D. 2015. Création rapide et efficace dun système de désambiguïsation lexicale pour une langue peu dotée. In Proceedings of the 22nd TALN (Traitement Automatique des Langues Naturelles) Conference.Google Scholar

Navigli, R., and Ponzetto, S. P., 2012. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193: 217–250.Google Scholar

Navigli, R., Jurgens, D., and Vannella, D. 2013. Semeval-2013: Multilingual word sense disambiguation. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics, vol. 2, pp. 222–231.Google Scholar

Och, F. J., and Ney, H. 2000. Improved statistical alignment models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 440–447.Google Scholar

Pado, S., and Pitel, G.. 2007. Annotation précise du français en sémantique de rôles par projection cross-linguistique. In Actes de la 14e conférence sur le Traitement Automatique des Langues Naturelles (communications orales), pp. 271–280.Google Scholar

Pan, S. J., and Yang, Q., 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22: 1345–1359.Google Scholar

Passban, P., Liu, Q., and Way, A., 2017. Translating low-resource languages by vocabulary adaptation from close counterparts. ACM Transactions on Asian and Low-Resource Language Information Processing 16: 29.Google Scholar

Petrov, S., Das, D., and McDonald, R. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation, European Language Resources Association, pp. 2089–2096.Google Scholar

Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1985. Learning internal representations by error propagation. DTIC Document. No. ICS-8506. California Univ San Diego La Jolla Inst for Cognitive Science.Google Scholar

Salah, M. H., Blanchon, H., Zrigui, M., and Schwab, D. 2016. Amélioration de la traduction automatique dun corpus annoté. In Proceedings of the 23rd TALN (Traitement Automatique des Langues Naturelles) Conference.Google Scholar

Schmid, H. 1995. Treetagger | a language independent part-of-speech tagger. Institut für Maschinelle Sprachverarbeitung, Universität Stuttgart, vol. 46, p. 28. Available at https://protect-eu.mimecast.com/s/STrqCK8y8fB91wiMedpW?domain=cis.uni-muenchen.dehttp://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/Google Scholar

Schmidhuber, J., 1992. A fixed size storage O (n3) time complexity learning algorithm for fully recurrent continually running networks. Neural Computation 4: 243–248.Google Scholar

Schuster, M., and Paliwal, K. K., 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45: 2673–2681.Google Scholar

Schwab, D., Goulian, J., Tchechmedjiev, A., and Blanchon, H. 2012. Ant colony algorithm for the unsupervised word sense disambiguation of texts: comparison and evaluation. In Proceedings of the 25th International Conference on Computational Linguistics, pp. 2389–2404.Google Scholar

Sundermeyer, M., Oparin, I., Gauvain, J.-L., Freiberg, B., Schluter, R., and Ney, H. 2013. Comparison of feedforward and recurrent neural network language models. In IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8430–8434.Google Scholar

Sutskever, I., Vinyals, O., and Le, Q. V. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems, pp. 3104–3112.Google Scholar

Täckström, O., McDonald, R., and Uszkoreit, J. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 477–487.Google Scholar

Täckström, O., McDonald, R., and Nivre, J. 2013. Target language adaptation of discriminative transfer parsers. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp. 1061–1071.Google Scholar

Täckström, O., Das, D., Petrov, S., McDonald, R., and Nivre, J., 2013. Token and type constraints for cross-lingual part-of-speech tagging. Transactions of the Association for Computational Linguistics 1: 1–12.Google Scholar

Titov, I., and Klementiev, A. 2012. Crosslingual induction of semantic roles. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 647–656.Google Scholar

Van der Plas, L., and Apidianaki, M. 2014. Cross-lingual word sense disambiguation for predicate labelling of french. In Proceedings of the 21st TALN (Traitement Automatique des Langues Naturelles) Conference, pp. 46–55.Google Scholar

Veronis, J., 2000. Annotation automatique de corpus: panorama et état de la technique. Ingénierie des langues 4 (4): 111–129.Google Scholar

Veronis, J., Hamon, O., Ayache, C., Belmouhoub, R., Kraif, O., Laurent, D., Nguyen, T. M. H., Semmar, N., Stuck, F., and Zaghouani, W. 2008. Arcade II Action de recherche concertée sur l’alignement de documents et son évaluation. Chapitre2, Editions Hermés.Google Scholar

Van der Maaten, L., and Hinton, G. (2008) Visualizing data using t-SNE. Journal of Machine Learning Research 9: 2579–2605.Google Scholar

Wisniewski, G., Pécheux, N., Gahbiche-Braham, S., and Yvon, F. 2014. Cross-lingual part-of-speech tagging through ambiguous learning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, vol. 14, pp. 1779–1785.Google Scholar

Yarowsky, D., Ngai, G., and Wicentowski, R. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st International Conference on Human Language Technology Research, pp. 1–8.Google Scholar

Article contents

A neural approach for inducing multilingual resources and natural language processing tools for low-resource languages

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests