Aproximación al análisis de la calidad de la traducción automática desde la perspectiva de la evaluación basada en el usuario : el caso de DeepL, Google Translate y ChatGPT en la combinación checo-español

Title: Aproximación al análisis de la calidad de la traducción automática desde la perspectiva de la evaluación basada en el usuario : el caso de DeepL, Google Translate y ChatGPT en la combinación checo-español
Variant title:
  • An approach to user-centered translation quality assessment of machine translation output : the case of DeepL, Google Translate, and ChatGPT in Czech-to-Spanish translation outputs
Source document: Études romanes de Brno. 2024, vol. 45, iss. 4, pp. 65-86
Extent
65-86
  • ISSN
    2336-4416 (online)
Type: Article
Language
Rights access
open access
 

Notice: These citations are automatically created and might not follow citation rules properly.

Abstract(s)
La generalización del uso de las aplicaciones gratuitas de traducción automática basadas en redes neuronales (NMT) demanda un mayor esfuerzo por parte de la comunidad científica por evaluar su calidad. En este artículo se presenta el estado de la cuestión, así como los resultados de un análisis piloto que trata de sacar a la luz el grado de satisfacción de los potenciales usuarios de estas traducciones en función de tres variables: fluidez, corrección gramatical y usabilidad. Con este fin, se realizó un experimento en el que veinte anotadores nativos de español evaluaron mediante una escala de valoración Likert las traducciones generadas por humanos profesionales y por las aplicaciones DeepL, Google Translate y ChatGPT de tres textos checos de diversa tipología (uno técnico, uno de marketing y uno literario). Los resultados muestran que, a pesar de que las traducciones humanas son las mejores valoradas, existe un elevado grado de satisfacción por parte de los usuarios respecto a las traducciones generadas por los sistemas NMT diseñados específicamente para este fin (DeepL y Google Translate) y, muy especialmente, en términos de fluidez y usabilidad.
The widespread use of free Neural Machine Translation (NMT) systems requires a greater effort on the part of the scientific community to evaluate their quality. This article presents the state of the art and the results of a pilot analysis aimed at revealing the level of satisfaction of potential users of these translations in terms of three variables: fluency, grammar, and usability. To this end, an experiment was carried out in which twenty native Spanish annotators evaluated, using a Likert rating scale, the translations generated by human professionals and by the applications DeepL, Google Translate, and ChatGPT of three Czech texts of different types (one technical, one marketing and one literary). The results show that although human translations are the best rated, there is a high degree of user satisfaction with the translations generated by NMT systems specifically designed for this purpose (DeepL and Google Translate), especially in terms of fluency and usability.
Note
La financiación para esta investigación ha sido otorgada a la Universidad Palacký de Olomouc por el Ministerio de Educación, Juventud y Deporte de la República Checa (IGA_ FF_2023_032).
References
[1] Castilho, S.; Doherty, S.; Gaspari, F.; & Moorkens, J. (2018). Approaches to Human and Machine Translation Quality Assessment. In J. Moorkens, Sh. Castilho, F. Gaspari, S. Doherty, (Eds.). Translation Quality Assessment (pp. 9–38). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_2

[2] Castilho S.; & O'Brien, S. (2016). Evaluating the impact of light post-editing on usability. In N. Calzolari et al. (Eds.). Proceedings of the tenth international conference on language resources and evaluation. Portorož, 23–28 May. (pp. 310–316).

[3] Černý, J. (2014). El español hablado en América. Olomouc: Univerzita Palackého v Olomouci.

[4] Dalayli, F. (2023). Use of NLP Techniques in Translation by ChatGPT: Case Study. In Proceedings of the Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTS) Incorporating the 16th Workshop on Building and Using Comparable Corpora (BUCC) (pp. 19–25). Varna (Bulgaria): INCOMA Ltd., Shoumen, Bulgaria.

[5] Dousková, I. (2023). El baile del oso. Barcelona: La Fuga Ediciones.

[6] Fomicheva, M.; Sun, S.; Yankovskaya, L.; Blain, F.; Guzmán, F.; Fishel, M.; Aletras, N.; Chaudhary, V.; & Specia, L. (2020). Unsupervised Quality Estimation for Neural Machine Translation. Transactions of the Association for Computational Linguistics, 8, 539–555. https://doi.org/10.1162/tacl_a_00330 | DOI 10.1162/tacl_a_00330

[7] Gaspari, F.; Almaghout, H.; & Doherty, S. (2015). A survey of machine translation competences: Insights for translation technology educators and practitioners. Studies in Translatology, 23, 3, 333–358. http://dx.doi.org/10.1080/0907676X.2014.979842 | DOI 10.1080/0907676x.2014.979842

[8] Gao, Y.; Wang, R.; & Hou, F. (2023). How to Design Translation Prompts for ChatGPT: An Empirical Study. arXiv:2304.02182v2 [cs.CL]. https://doi.org/10.48550/arXiv.2304.02182

[9] Gunathilaka, D. D. I. M. B.; & Ariyaratne, W. M. (2019). A Study on the Accuracy of Human Translation Output and Post-Edited Google Translate Output as far as English and Sinhalese Language Pair is considered: With Special Reference to Selected Literary and Non-literary Documents. International Journal of Research and Innovation in Social Science (IJRISS), Volume III, Issue VII, 503–510.

[10] Hassan, H. et al. (2018). Achieving Human Parity on Automatic Chinese to English News Translation. arXiv:1803.05567 [cs.CL]. https://doi.org/10.48550/arxiv.1803.05567

[11] Hendy, A.; Abdelrehim, M.; Sharaf, A.; Raunak, V.; Gabr, M.; Matsushita, H.; Kim, Y. J.; Afify, M.; & Awadalla, H. H. (2023). How good are gpt models at machine translation? A comprehensive evaluation. arXiv:2302.09210v1. https://doi.org/10.48550/arXiv.2302.09210

[12] House, J. (2001). How do we know when a translation is good? In E. Steiner, & C. Yallop (Eds.). Exploring Translation and Multilingual Text Production: Beyond Content (pp. 127–160). Berlin: De Gruyter. | DOI 10.1515/9783110866193.127

[13] International Organization for Standardisation (2002). ISO/TR 16982:2002 ergonomics of human-system interaction—usability methods supporting human centred design. International Organization for Standardisation, Geneva. https://www.iso.org/obp/ui/#iso:std:iso:ts:20282:-2:ed-2:v1:en [29/2/2024]

[14] Jiao, W.; Wang, W.; Huang, J.; & Wang, X. (2023). Is ChatGPT a good translator? Yes With GPT-4 As The Engine. arXiv:2301.08745v4. https://doi.org/10.48550/arXiv.2301.08745

[15] Klerke, S.; Castilho, S.; Barret, M.; & Søgaard, A. (2015). Reading metrics for estimating task efficiency with SMT output. In Proceedings of the Sixth Workshop on Cognitive Aspects of Computational Language Learning (pp. 6–13). Lisbon: Association for Computational Linguistics. | DOI 10.18653/v1/w15-2402

[16] Lakew, S. M.; Federico, M.; Negri, M.; & Turchi, M. (2018). Multilingual Neural Machine Translation for Low-Resource Languages, IJCoL [Online] (pp. 11–25). https://doi.org/10.4000/ijcol.531 | DOI 10.4000/ijcol.531

[17] Lee, T. (2023). Artificial intelligence and posthumanist translation: ChatGPT versus the translator. Applied Linguistics Review (Ahead of Print). https://doi.org/10.1515/applirev-2023–0122 | DOI 10.1515/applirev-2023-0122

[18] López González, A. M. (2019). Español neutro – español latino: Hacia una norma hispanoamericana en los medios de comunicación. Roczniki Humanistyczne, 67, 5, 7–27. https://doi.org/10.18290/rh.2019.67.5–1 | DOI 10.18290/rh.2019.67.5-1

[19] Manakhimova, S. et al. (2023). Linguistically Motivated Evaluation of the 2023 State-of-the-art Machine Translation. In Proceedings of the Eighth Conference on Machine Translation (WMT), 224–245. https://doi.org/10.18653/v1/2023.wmt-1.23 | DOI 10.18653/v1/2023.wmt-1.23

[20] Martínez Melis, N.; & Hurtado Albir, A. (2001). Assessment In Translation Studies: Research Needs. Meta, 46, 2, 272–287. https://doi.org/10.7202/003624ar | DOI 10.7202/003624ar

[21] Ranathunga, S.; Lee, E. A.; Skenduli, M. P.; Shekhar, R.; Alam, M.; & Kaur, R. (2023). Neural Machine Translation for Low-resource Languages: A Survey. ACM Computing Surveys, 55, 11, Article 229. https://doi.org/10.1145/3567592 | DOI 10.1145/3567592

[22] Sahari, Y.; Al-Kadi, A. M. T.; & Ali, J. K. M. (2023). Cross Sectional Study of ChatGPT in Translation: Magnitude of Use, Attitudes, and Uncertainties. Journal of Psycholinguistic Research 52, 2937–2954. https://doi.org/10.1007/s10936–023–10031-y | DOI 10.1007/s10936-023-10031-y

[23] Specia, L.; & Shah, K. (2018). Machine Translation Quality Estimation: Applications and Future Perspectives. In J. Moorkens, Sh. Castilho, F. Gaspari, & S. Doherty, (Eds.). Translation Quality Assessment (pp. 201–235). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_10

[24] Suojanen, T.; Koskinen, K.; & Tuominen, T. (2014). User-Centered Translation. London: Routledge.

[25] Suokas, J. (2019). User-centered Translation and Action Research Inquiry. Bringing UCT into the Field. Kääntämisen ja tulkkauksen tutkimuksen symposiumin verkkojulkaisu / Electronic Journal of the KäTu Symposium on Translation and Interpreting Studies, Vol. 12, 29–43. | DOI 10.61200/mikael.129364

[26] Taira, B. R.; Kreger, V.; Orue, A.; & Diamond, L. C. (2021). A Pragmatic Assessment of Google Translate for Emergency Department Instructions. Journal of General Internal Medicine, Volume 36, 3361–3365. | DOI 10.1007/s11606-021-06666-z

[27] Toral, A.; & Way, A. (2018). What Level of Quality Can Neural Machine Translation Attain on Literary Text? In J. Moorkens, Sh. Castilho, F. Gaspari, & S. Doherty, (Eds.). Translation Quality Assessment (pp. 263–287). Cham: Springer. | DOI 10.1007/978-3-319-91241-7_12

[28] Ul Haq, S.; Rauf, S. A.; Shoukat, A.; & Saeed, A. (2020). Document Level NMT of Low-Resource Languages with Backtranslation. Proceedings of the 5th Conference on Machine Translation (WMT), online (pp. 442–446).

[29] Wang, L. et al. (2023). Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs. In Proceedings of the Eighth Conference on Machine Translation (WMT) (pp. 55–67). https://aclanthology.org/2023.wmt-1.3.pdf | DOI 10.18653/v1/2023.wmt-1.3

[30] Way, A. (2018). Quality Expectations of Machine Translation. In J. Moorkens, Sh. Castilho, F. Gaspari, S. Doherty, (Eds.). Translation Quality Assessment (pp. 159–178). Cham: Springer.

[31] Zaretskaya, A.; Corpas Pastor, G.; & Seghiri, M. (2015). Translators' Requirements for Translation Technologies: a User Survey. In New Horizons in Translation and Interpreting Studies (pp. 247–254). Geneva: Tradulex.