ABSTRACT
Besides word order, word choice is a key stumbling block for machine translation (MT) in morphologically rich languages due to homonyms and polysemous difficulties. On the other hand, un-translated/improperly translated words are a severe issue for Statistical Machine Translation (SMT) models. The quantity of parallel training corpus has limited unsupervised SMT (USMT) systems. Still, current research lines have successfully trained SMT systems in an unsupervised manner using monolingual data alone. However, there is still a need to enhance the translation quality of the MT output due to unaligned and improperly sensed words. This problem is addressed by incorporating unsupervised Word Sense Disambiguation (WSD) into the decoding phase of USMT. The work provided a compendium of SMT systems for five translation tasks, i.e. En→Indic languages for the WMT test dataset and evaluated on BLEU and METEOR evaluation metrics. The studies were performed on En→Hi, En→Kn, En→Ta, En→Te, and En→Be tasks and showed an improvement in BLEU points by 2.3, 2.68, 0.78, 2.32, and 1.79, respectively, and METEOR points by 1.07, 1.34, 0.72, 0.693, and 1.191, respectively, over the baseline model.
DISCLOSURE STATEMENT
No potential conflict of interest was reported by the author(s).
Notes
Additional information
Notes on contributors
Shefali Saxena
Shefali Saxena is pursuing her PhD in natural language processing, statistical machine translation, on low resource languages from the National Institute of Technology, Hamirpur (Himachal Pradesh). She pursued her Master of Technology in communication systems from the National Institute of Technology, Uttarakhand in 2019 from the ECE Department, BTech (Hons) 2015 in ECE from Rajasthan Technical University, Jaipur. Rajasthan. Her research interests include computer vision and image processing, signal processing, natural language processing, and deep learning architecture.
Uttkarsh Chaurasia
Uttkarsh Chaurasia is a final-year student at the National Institute of Technology Hamirpur, where he is pursuing a dual degree in computer science and engineering. His research interests concentrate on deep learning and machine learning. Email: [email protected]
Nitin Bansal
Nitin Bansal is a third-year student at the National Institute of Technology Hamirpur, where he is pursuing a dual degree in electronics and communication engineering. He enjoys working in Machine learning and data analysis fields with a keen interest in nature language processing. Email: [email protected]
Philemon Daniel
Philemon Daniel received his PhD in electronics and communication engineering (NIT Hamirpur), MTech in VLSI Design (VIT Vellore), and BE in E&C Engineering (Bharathidasan University). He has over 13 years of teaching experience at NIT, Hamirpur. He worked as design engineer at Sasken Communication Technologies Limited, Bangalore. His research interests include VLSI testing, embedded systems, image and speech processing, natural language processing, and deep learning architectures. He gives regular talks on deep learning architectures, ARM processors and applications and similar areas. He was awarded ARM-Accredited Microcontroller Engineer (AAME) in 2015. He is the recipient of the NVIDIA GPU Grant in 2018. Email: [email protected]