From Machine Translated NLI Corpus to Universal Sentence Representations in Czech

Martin Víta

From Machine Translated NLI Corpus to Universal Sentence Representations in Czech

Martin Víta

DOI: http://dx.doi.org/10.15439/2020F212

Citation: Position Papers of the 2020 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 22, pages 3–8 (2020)

Full text

Abstract. Natural language inference (NLI) is a sentence-pair classification task w.r.t. the entailment relation. As already shown, certain deep learning architectures for NLI task -- InferSent in particular -- may be exploited for obtaining (supervised) universal sentence embeddings. Although InferSent approach to sentence embeddings has been recently outperformed in different tasks by transformer-based architectures (like BERT and its derivatives), it still remains a useful tool in many NLP areas and it also serves as a strong baseline. One of the greatest advantages of this approach is its relative simplicity. Moreover, in contrast to other approaches, the training of InferSent models can be performed on a standard GPU within hours. Unfortunately, the majority of research on sentence embeddings in general is done in/for English, whereas other languages are apparently neglected. In order to fill this gab, we propose a methodology for obtaining universal sentence embeddings in another language -- arising from training InferSent-based sentence encoders on machine translated NLI corpus and present a transfer learning use-case on semantic textual similarity in Czech.

References

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, “A large annotated corpus for learning natural language inference,” arXiv preprint https://arxiv.org/abs/1508.05326, 2015.
A. Williams, N. Nangia, and S. Bowman, “A broad-coverage challenge corpus for sentence understanding through inference,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Jun. 2018, pp. 1112–1122.
A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learning of universal sentence representations from natural language inference data,” arXiv preprint https://arxiv.org/abs/1705.02364, 2017.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543.
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, “From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 67–78, 2014.
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma et al., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” International Journal of Computer Vision, vol. 123, no. 1, pp. 32–73, 2017.
R. Sifa, M. Pielka, R. Ramamurthy, A. Ladi, L. Hillebrand, and C. Bauckhage, “Towards contradiction detection in german: a translation-driven approach,” in 2019 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2019, pp. 2497–2505.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002, pp. 311–318.
C. Callison-Burch, M. Osborne, and P. Koehn, “Re-evaluation the role of bleu in machine translation research,” in 11th Conference of the European Chapter of the Association for Computational Linguistics, 2006.
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint https://arxiv.org/abs/1409.1259, 2014.
G. Majumder, P. Pakray, A. Gelbukh, and D. Pinto, “Semantic textual similarity methods, tools, and applications: A survey,” Computación y Sistemas, vol. 20, no. 4, pp. 647–665, 2016.
Y. Li, D. McLean, Z. A. Bandar, J. D. O’shea, and K. Crockett, “Sentence similarity based on semantic nets and corpus statistics,” IEEE transactions on knowledge and data engineering, vol. 18, no. 8, pp. 1138–1150, 2006.
E. Agirre, D. Cer, M. Diab, and A. Gonzalez-Agirre, “Semeval-2012 task 6: A pilot on semantic textual similarity,” in * SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), 2012, pp. 385–393.
E. Agirre, D. Cer, M. Diab, A. Gonzalez-Agirre, and W. Guo, “* sem 2013 shared task: Semantic textual similarity,” in Second joint conference on lexical and computational semantics (* SEM), volume 1: proceedings of the Main conference and the shared task: semantic textual similarity, 2013, pp. 32–43.
R. Gupta, H. Bechara, and C. Orasan, “Intelligent translation memory matching and retrieval metric exploiting linguistic technology,” Proc. of Translating and the Computer, vol. 36, pp. 86–89, 2014.
L. Svoboda and T. Brychcín, “Czech dataset for semantic textual similarity,” in International Conference on Text, Speech, and Dialogue. Springer, 2018, pp. 213–221.
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint https://arxiv.org/abs/1412.6980, 2014.
P. Kędzia, M. Piasecki, and A. Janz, “Graph-based approach to recog nizing cst relations in polish texts,” in Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, 2017, pp. 363–371.
M. Arkhipov, M. Trofimova, Y. Kuratov, and A. Sorokin, “Tuning multilingual transformers for language-specific named entity recognition,” in Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019, pp. 89–93.