209
Views
2
CrossRef citations to date
0
Altmetric
Computers and computing

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages

ORCID Icon, , & ORCID Icon
 

ABSTRACT

Besides word order, word choice is a key stumbling block for machine translation (MT) in morphologically rich languages due to homonyms and polysemous difficulties. On the other hand, un-translated/improperly translated words are a severe issue for Statistical Machine Translation (SMT) models. The quantity of parallel training corpus has limited unsupervised SMT (USMT) systems. Still, current research lines have successfully trained SMT systems in an unsupervised manner using monolingual data alone. However, there is still a need to enhance the translation quality of the MT output due to unaligned and improperly sensed words. This problem is addressed by incorporating unsupervised Word Sense Disambiguation (WSD) into the decoding phase of USMT. The work provided a compendium of SMT systems for five translation tasks, i.e. En→Indic languages for the WMT test dataset and evaluated on BLEU and METEOR evaluation metrics. The studies were performed on En→Hi, En→Kn, En→Ta, En→Te, and En→Be tasks and showed an improvement in BLEU points by 2.3, 2.68, 0.78, 2.32, and 1.79, respectively, and METEOR points by 1.07, 1.34, 0.72, 0.693, and 1.191, respectively, over the baseline model.

DISCLOSURE STATEMENT

No potential conflict of interest was reported by the author(s).

Notes

Additional information

Notes on contributors

Shefali Saxena

Shefali Saxena is pursuing her PhD in natural language processing, statistical machine translation, on low resource languages from the National Institute of Technology, Hamirpur (Himachal Pradesh). She pursued her Master of Technology in communication systems from the National Institute of Technology, Uttarakhand in 2019 from the ECE Department, BTech (Hons) 2015 in ECE from Rajasthan Technical University, Jaipur. Rajasthan. Her research interests include computer vision and image processing, signal processing, natural language processing, and deep learning architecture.

Uttkarsh Chaurasia

Uttkarsh Chaurasia is a final-year student at the National Institute of Technology Hamirpur, where he is pursuing a dual degree in computer science and engineering. His research interests concentrate on deep learning and machine learning. Email: [email protected]

Nitin Bansal

Nitin Bansal is a third-year student at the National Institute of Technology Hamirpur, where he is pursuing a dual degree in electronics and communication engineering. He enjoys working in Machine learning and data analysis fields with a keen interest in nature language processing. Email: [email protected]

Philemon Daniel

Philemon Daniel received his PhD in electronics and communication engineering (NIT Hamirpur), MTech in VLSI Design (VIT Vellore), and BE in E&C Engineering (Bharathidasan University). He has over 13 years of teaching experience at NIT, Hamirpur. He worked as design engineer at Sasken Communication Technologies Limited, Bangalore. His research interests include VLSI testing, embedded systems, image and speech processing, natural language processing, and deep learning architectures. He gives regular talks on deep learning architectures, ARM processors and applications and similar areas. He was awarded ARM-Accredited Microcontroller Engineer (AAME) in 2015. He is the recipient of the NVIDIA GPU Grant in 2018. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.