209
Views
2
CrossRef citations to date
0
Altmetric
Computers and computing

Improved Unsupervised Statistical Machine Translation via Unsupervised Word Sense Disambiguation for a Low-Resource and Indic Languages

ORCID Icon, , & ORCID Icon

REFERENCES

  • S. Edunov, M. Ott, M. Auli, and D. Grangier, “Understanding back-translation at scale,” arXiv preprint arXiv:1808.09381, 2018.
  • N. K. Jadoon, W. Anwar, U. I. Bajwa, and F. Ahmad, “Statistical machine translation of Indian languages: A survey,” Neural Comput. Appl., Vol. 31, no. 7, pp. 2455–2467, 2019. DOI: 10.1007/s00521-017-3206-2.
  • P. Koehn, Statistical Machine Translation. New York: Cambridge University Press, 2010.
  • P. Koehn and R. Knowles, “Six challenges for neural machine translation,” arXiv preprint arXiv:1706.03872, 2017.
  • A. Mikel, L. Gorka, and A. Eneko, “Unsupervised statistical machine translation,” Brussels; Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. arXiv, arXiv:1809.01272, 2018.
  • A. Mikel, L. Gorka, and A. Eneko, “An effective approach to unsupervised machine translation,” arXiv preprint arXiv:1902.01313, 2019.
  • G. Lample, M. Ott, A. Conneau, L. Denoyer, and M. A. Ranzato, “Phrase-based & neural unsupervised machine translation,” arXiv preprint arXiv:1804.07755, 2018.
  • A. Tezcan, V. Hoste, and L. Macken, “Estimating word-level quality of statistical machine translation output using monolingual information alone,” Nat. Lang. Eng., Vol. 26, no. 1, pp. 73–94, 2020.
  • A. Kumar and V. Goyal, “Hindi to Punjabi machine translation system based on statistical approach,” J. Stat. Manage. Syst., Vol. 21, no. 4, pp. 547–552, 2018. DOI: 10.1080/09720510.2018.1466963.
  • R. K. Chakrawarti and P. Bansal, “Approaches for improving Hindi to English machine translation system,” Indian J. Sci. Technol., Vol. 10, no. 16, pp. 1–8, 2017. DOI: 10.17485/ijst/2017/v10i16/111895.
  • N. K. Jadoon, W. Anwar, and N. Durrani, “Machine translation approaches and survey for Indian languages,” arXiv preprint arXiv:1701.04290, 2017.
  • S. Mall and U. C. Jaiswal, “Survey: Machine translation for Indian language,” Int. J. Appl. Eng. Res., Vol. 13, no. 1, pp. 202–209, 2018.
  • R. Ananthakrishnan, P. Bhattacharyya, M. Sasikumar, and R. M. Shah, “Some issues in automatic evaluation of English-Hindi MT: More blues for bleu,” ICON, 2007.
  • R. N. Patel, P. B. Pimpale, and M. Sasikumar, “Machine translation in Indian languages: Challenges and resolution,” J. Intell. Syst., Vol. 28, no. 3, pp. 437–445, 2019.
  • R. Navigli, “Word sense disambiguation: A survey,” ACM Comput. Surv. (CSUR), Vol. 41, no. 2, pp. 1–69, 2009. DOI: 10.1145/1459352.1459355.
  • M. Bevilacqua, T. Pasini, A. Raganato, and R. Navigli, “Recent trends in word sense disambiguation: A survey,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, International Joint Conference on Artificial Intelligence, Inc, Vienna, pp. 4330–4338.
  • M. Carpuat and D. Wu, “Evaluating the word sense disambiguation performance of statistical machine translation,” in Companion Volume to the Proceedings of Conference Including Posters/Demos and Tutorial Abstracts, 2005.
  • M. Carpuat and D. Wu, “Improving statistical machine translation using word sense disambiguation,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007, pp. 61–72.
  • M. Carpuat, Word Sense Disambiguation for Statistical Machine Translation. Hong Kong University of Science and Technology (Hong Kong), 2008.
  • K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, “Bleu: A method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311–318.
  • A. Agarwal and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments,” in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, June 2005, pp. 65–72.
  • Q. Dou and K. Knight, “Large scale decipherment for out-of-domain machine translation,” in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, 12–14 July 2012, pp. 266–275.
  • Q. Dou and K. Knight, “Dependency-based decipherment for resource-limited machine translation,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18–21 October 2013, 1668–1676.
  • Q. Dou, A. Vaswani, K. Knight, and C. Dyer, “Unifying Bayesian inference and vector space models for improved decipherment,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, July 26–31, 2015, pp. 836–845.
  • S. Ravi and K. Knight, “Deciphering foreign language,” in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, June 19–24, 2011, pp. 12–21.
  • V. Siivola, T. Hirsimaki, and S. Virpioja, “On growing and pruning Kneser–Ney smoothed N-gram models,” IEEE Trans. Audio Speech Lang. Process., Vol. 15, no. 5, pp. 1617–1624, 2007.
  • N. Ueffing, G. Haffari, and A. Sarkar, “Transductive learning for statistical machine translation,” in Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, June 2007, pp. 25–32.
  • P. Koehn and J. Schroeder, “Experiments in domain adaptation for statistical machine translation,” in Proceedings of the Second Workshop on Statistical Machine Translation, Prague, June 2007, pp. 224–227.
  • Y. Lü, J. Huang, and Q. Liu, “Improving statistical machine translation performance by training data selection and optimization,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, June 2007, pp. 343–350.
  • S. Matsoukas, A. V. Rosti, and B. Zhang, “Discriminative corpus weight estimation for machine translation,” in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–7 August 2009, pp. 708–717.
  • G. Foster, C. Goutte, and R. Kuhn, “Discriminative instance weighting for domain adaptation in statistical machine translation,” in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, MIT, Massachusetts, USA, 9–11 October 2010, pp. 451–459.
  • A. Vaswani, Y. Zhao, V. Fossum, and D. Chiang, “Decoding with large-scale neural language models improves translation,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 18–21 October 2013, pp. 1387–1392.
  • H. Sujaini, “Improving the role of language model in statistical machine translation (Indonesian-Javanese),” Int. J. Electr. Comput. Eng., Vol. 10, no. 2, pp. 2102–2109, 2020.
  • C. Baziotis, B. Haddow, and A. Birch, “Language model prior for low-resource neural machine translation,” arXiv preprint arXiv, Vol. 2004, pp. 14928, 2020.
  • K. Rottmann and S. Vogel, “Word reordering in statistical machine translation with a POS-based distortion model,” in Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers. Skövde, Sweden, 2007, pp. 171–180.
  • E. Agirre and P. Edmonds, Word Sense Disambiguation: Algorithms and Applications. Vol. 33. Springer Science & Business Media, 2007
  • Y. K. Lee and H. T. Ng, “An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, July 2002, pp. 41–48.
  • H. Y. Huang, Z. Yang, and P. Jian, “Unsupervised word sense disambiguation using neighborhood knowledge,” in Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, Singapore, PACLIC, pp. 333–342.
  • T. Martín Wanton and R. Berlanga Llavori, “A clustering-based approach for unsupervised word sense disambiguation,” Proces. del Leng. Natural, Vol. 49, pp. 49–56, 2012.
  • R. Giyanani, “A survey on word sense disambiguation,” IOSR J. Comput. Eng., Vol. 14, pp. 30–33, 2013. DOI: 10.9790/0661-1463033.
  • N. Rahman and B. Borah, “An unsupervised method for word sense disambiguation,” J. King Saud Univ.-Comput. Inf.Sci., 2021. DOI: 10.1016/j.jksuci.2021.07.022.
  • A. R. Pal and D. Saha, “Word sense disambiguation in Bengali language using unsupervised methodology with modifications,” Sādhanā, Vol. 44, no. 7, pp. 1–13, 2019.
  • S. Chauhan, P. Daniel, S. Saxena, and A. Sharma, “Fully unsupervised machine translation using context-aware word translation and denoising autoencoder,” Appl. Artif. Intell., Vol. 36, no. 1, 2022. DOI: 10.1080/08839514.2022.2031817.
  • A. Mikel, L. Gorka, and A. Eneko. “A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings,” arXiv preprint arXiv:1805.06297, 2018.
  • S. Saxena, S. Chauhan, and P. Daniel, “Analysis of unsupervised statistical machine translation using cross-lingual word embedding for English–Hindi,” in International Conference on Computational Techniques and Applications, Singapore: Springer, 2022, pp. 61–68.
  • F. J. Och, C. Tillmann, and H. Ney, “Improved alignment models for statistical machine translation,” in 1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, USA, 1999.
  • M. Pelevina, N. Arefyev, C. Biemann, and A. Panchenko, “Making sense of word embeddings,” arXiv preprint arXiv:1708.03390, 2017.
  • F. J. Och, “Minimum error rate training in statistical machine translation,” in Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, Jul. 2003, pp. 160–167.
  • R. Sennrich, B. Haddow, and A. Birch, “Improving neural machine translation models with monolingual data,” arXiv preprint arXiv:1511.06709, 2016.
  • D. Kakwani, A. Kunchukuttan, S. Golla, N. C. Gokul, A. Bhattacharyya, M. M. Khapra, and P. Kumar, “Inlpsuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings of the Association for Computational Linguistics: EMNLP 2020, November 16–20, 2020, pp. 4948–4961.
  • S. Chauhan, S. Saxena, and P. Daniel, “Monolingual and parallel corpora for Kangri low resource language,” arXiv preprint arXiv:2103.11596, 2021.
  • S. Chauhan, P. Daniel, and S. Saxena, “Analysis of neural machine translation KANGRI language by unsupervised and semi supervised methods,” IETE. J. Res., 2020. DOI: 10.1080/03772063.2021.2016506.
  • M. Post, C. Callison-Burch, and M. Osborne, “Constructing parallel corpora for six Indian languages via crowdsourcing,” in Proceedings of the 7th Workshop on Statistical Machine Translation, Montreal, Canada, June 7–8, 2012. pp. 401–409.
  • P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, and E. Herbst, “Moses: Open source toolkit for statistical machine translation,” in Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, June 2007, pp. 177–180.
  • K. Heafield, “KenLM: Faster and smaller language model queries,” in Proceedings of the 6th Workshop on Statistical Machine Translation, Edinburgh, Scotland, UK, July 30–31, 2011, pp. 187–197.
  • C. Tillmann. A Unigram Orientation Model for Statistical Machine Translation. IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 2004.
  • Sydney Richards, UMM CSci Senior Seminar Conference, Nov. 2017, Morris, MN. Available: https://umm-csci.github.io/senior-seminar/seminars/fall2017/richards.pdf.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.