227
Views
3
CrossRef citations to date
0
Altmetric
Research Article

Evaluation of clustering techniques on Urdu News head-lines: a case of short length text

ORCID Icon &
Pages 489-510 | Received 15 Apr 2021, Accepted 18 Jun 2022, Published online: 24 Jun 2022

References

  • Amigó E, Gonzalo J, Artiles J and Verdejo F. (2009). A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retrieval, 12(4), 461–486. 10.1007/s10791-008-9066-8
  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
  • Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5 1 , 135–146. https://doi.org/10.1162/tacl_a_00051
  • Chen, J., Gong, Z., & Liu, W. (2020). A Dirichlet process biterm-based mixture model for short text stream clustering. Applied Intelligence, 50(5), 1609–1619. https://doi.org/10.1007/s10489-019-01606-1
  • Curiskis, S. A., Drake, B., Osborn, T. R., & Kennedy, P. J. (2019). An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Information Processing & Management.
  • Daud, A., Khan, W., & Che, D. (2017). Urdu language processing: A survey. Artificial Intelligence Review, 47(3), 279–311. https://doi.org/10.1007/s10462-016-9482-x
  • Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings Of The Conference.
  • Dueck, D., & Frey, B. J. (2007). Non-metric affinity propagation for unsupervised image categorization. Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2007.4408853
  • Ehsan, T., & Shahzad Asif, H. M. (2018). Finding topics in Urdu: A study of applicability of document clustering on Urdu language. Pakistan Journal of Engineering and Applied Sciences 23 JULY 2018 77–85 .
  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.
  • Florence, R., Nogueira, B., & Marcacini, R. (2017). Constrained hierarchical clustering for news events. ACM International Conference Proceeding Series. https://doi.org/10.1145/3105831.3105859
  • Gialampoukidis, I., Vrochidis, S., & Kompatsiaris, I. (2016). A hybrid framework for news clustering based on the DBSCAN-Martingale and LDA. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9729 170–184 . https://doi.org/10.1007/978-3-319-41920-6_13
  • Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T. (2019). Learning word vectors for 157 languages. LREC 2018 - 11th International Conference on Language Resources and Evaluation.
  • Guan, R., Zhang, H., Liang, Y., Giunchiglia, F., Huang, L., & Feng, X. (2020). Deep feature-based text clustering and its explanation. IEEE Transactions on Knowledge and Data Engineering 14 8 , 1. https://doi.org/10.1109/TKDE.2020.3028943
  • Hubert L and Arabie P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. 10.1007/BF01908075
  • Kanwal, S., Malik, K., Shahzad, K., Aslam, F., & Nawaz, Z. (2019). Urdu named entity recognition: Corpus generation and deep learning applications. ACM Transactions on Asian and Low-Resource Language Information Processing 19 1 1–13 . https://doi.org/10.1145/3329710
  • Li, C., Wang, H., Zhang, Z., Sun, A., & Ma, Z. (2016). Topic modeling for short texts with auxiliary word embeddings. SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. https://doi.org/10.1145/2911451.2911499
  • Liu, X., Wang, B., Xi, Y., Mao, E., Ke, S., & Tang, Y. (2016 Feature Word Vector Based on Short Text Clustering Computer Science and Technology: Proceedings of the International Conference (CST2016) Shenzhen, China). , 533–545. doi:https://doi.org/10.1142/9789813146426_0061
  • Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2), 129–137. https://doi.org/10.1109/TIT.1982.1056489
  • Lu, M., Qin, Z., Cao, Y., Liu, Z., & Wang, M. (2014). Scalable news recommendation using multi-dimensional similarity and Jaccard-Kmeans clustering. Journal of Systems and Software, 95 September 2014 , 242–251. https://doi.org/10.1016/j.jss.2014.04.046
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 , 3111–3119.
  • Mustafa, M., Zeng, F., Ghulam, H., & Arslan, H. M. (2020). Urdu documents clustering with unsupervised and semi-supervised probabilistic topic modeling. Information (Switzerland), 11(11), 518. https://doi.org/10.3390/info11110518
  • Nasim, Z., & Haider, S. (2020). Cluster analysis of urdu tweets. Journal of King Saud University - Computer and Information Sciences 34 5 2170–2179 . https://doi.org/10.1016/j.jksuci.2020.08.008
  • Nasim, Z., Abidi, S., & Haider, S. (2020). Modeling POS tagging for the Urdu language. 2020 International Conference on Emerging Trends in Smart Technologies, ICETST 2020. https://doi.org/10.1109/ICETST49965.2020.9080721
  • Ottesen, A., Jonas, S., Therkelsen, F., & Gambäck, B. (2017). Twitter topic modeling by tweet aggregation. Proceedings of the 21st Nordic Conference of Computational Linguistics.
  • Potapenko, A., Popov, A., & Vorontsov, K. (2017). Interpretable probabilistic embeddings: Bridging the gap between topic models and neural networks Conference on Artificial Intelligence and Natural Language 789 (Springer, Cham)167–180 . . https://doi.org/10.1007/978-3-319-71746-3_15
  • Rahman, A., Khan, K., Khan, W., Khan, A., & Saqia, B. (2018). Unsupervised machine learning based documents clustering in Urdu. ICST Transactions on Scalable Information Systems, 5(19), 156081. https://doi.org/10.4108/eai.19-12-2018.156081
  • Rehman, A. U., Rehman, Z., Akram, J., Ali, W., Shah, M. A., & Salman, M. (2018 Statistical topic modeling for Urdu text articles 24th International Conference on Automation and Computing (ICAC) Newcastle Upon Tyne, UK (IEEE)1–6). . doi:10.23919/IConAC.2018.8748975
  • Shakeel, K., Tahir, G. R., Tehseen, I., & Ali, M. (2018). A framework of Urdu topic modeling using latent dirichlet allocation (LDA). 2018 IEEE 8th Annual Computing and Communication Workshop and Conference, CCWC 2018. https://doi.org/10.1109/CCWC.2018.8301655
  • Subramani, S., Sridhar, V., & Shetty, K. (2019). A novel approach of neural topic modelling for document clustering. Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence, SSCI 2018. https://doi.org/10.1109/SSCI.2018.8628912
  • Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research 11 2837–2854 .
  • Yang, P., Li, W., & Zhao, G. (2019). Language model-driven topic clustering and summarization for news articles. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2960538

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.