Search in:

Advanced search

Communication Methods and Measures Latest Articles

Submit an article Journal homepage

Open access

676

Views

CrossRef citations to date

Altmetric

Research Article

What’s in a name? The effect of named entities on topic modelling interpretability

Petro TolochkoDepartment of Communication, University of Vienna, Wien, AustriaCorrespondence[email protected]
View further author information

Paul BalluffDepartment of Communication, University of Vienna, Wien, AustriaView further author information

Jana BernhardDepartment of Communication, University of Vienna, Wien, AustriaView further author information

Sebastian GalygaDepartment of Communication, University of Vienna, Wien, AustriaView further author information

Noëlle S. LeberneggDepartment of Communication, University of Vienna, Wien, AustriaView further author information

Hajo G. BoomgaardenDepartment of Communication, University of Vienna, Wien, Austria

https://orcid.org/0000-0002-5260-1284 View further author information

Published online: 29 Jan 2024

Cite this article
https://doi.org/10.1080/19312458.2024.2302120
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Bischof, J., & Airoldi, E. M. (2012). Summarizing topical content with word frequency and exclusivity. Proceedings of the 29th International Conference on Machine Learning (ICML-12), 201–208. Edinburgh Scotland.
Google Scholar
Boukes, M., & Vliegenthart, R. (2020). A general pattern in the construction of economic newsworthiness? analyzing news factors in popular, quality, regional, and financial newspapers. Journalism, 21(2), 279–300. https://doi.org/10.1177/1464884917725989
Web of Science ®Google Scholar
Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., & Riddell, A. (2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1). https://doi.org/10.18637/jss.v076.i01
PubMed Web of Science ®Google Scholar
Chang, J., & Blei, D. (2009). Relational topic models for document networks. In D. van Dyk & M. Welling (Eds.), Proceedings of the twelth international conference on artificial intelligence and statistics (pp. 81–88). Florida, USA.
Google Scholar
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., & Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. Advances in Neural Information Processing Systems, 288–296. Vancouver, B.C., Canada.
Google Scholar
Denny, M., & Spirling, A. (2017 (September 27, 2017). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. When It Misleads, and What to Do About It, https://doi.org/10.2139/ssrn.2849145
Google Scholar
Doogan, C., & Buntine, W. (2021). Topic model or topic twaddle? re-evaluating semantic interpretability measures. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3824–3848.
Google Scholar
Eberl, J.-M., Boomgaarden, H. G., & Wagner, M. (2017). One bias fits all? three types of media bias and their effects on party preferences. Communication Research, 44(8), 1125–1148. https://doi.org/10.1177/0093650215614364
Web of Science ®Google Scholar
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472. https://doi.org/10.1214/ss/1177011136
Google Scholar
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
Google Scholar
Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis, 21(3), 267–297. https://doi.org/10.1093/pan/mps028
Web of Science ®Google Scholar
Grusky, M., Naaman, M., & Artzi, Y. (2018). Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. arXiv Preprint arXiv: 180411283.
Google Scholar
Honnibal, M., & Johnson, M. (2015). An improved non-monotonic transition system for dependency parsing. Proceedings of the 2015 conference on empirical methods in natural language processing, 1373–1378. Lisbon, Portugal.
Google Scholar
Hoyle, A., Goel, P., Peskov, D., Hian-Cheong, A., Boyd-Graber, J., & Resnik, P. (2021). Is automated topic model evaluation broken?: The incoherence of coherence. arXiv Preprint arXiv: 210702173.
Google Scholar
Hudson, R. (1994). About 37% of word-tokens are nouns. Language, 70(2), 331–339. https://doi.org/10.2307/415831
Web of Science ®Google Scholar
Hu, L., Li, J., Li, Z., Shao, C., & Li, Z. (2013). Incorporating entities in news topic modeling. In G. Zhou, J. Li, D. Zhao, & Y. Feng, Eds. Natural language processing and Chinese computing. (pp. 139–150). Springer: 14. https://doi.org/10.1007/978-3-642-41644-6.
Google Scholar
Jacobi, C., & K¨onigslo¨w, K.-K.v., & Ruigrok, N. (2016). Political news in online and print newspapers. Digital Journalism, 4(6), 723–742. https://doi.org/10.1080/21670811.2015.1087810
Web of Science ®Google Scholar
Krasnashchok, K., & Jouili, S. (2018). Improving topic quality by promoting named entities in topic modeling. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 247–253. https://doi.org/10.18653/v1/P18-2040
Google Scholar
Kripke, S. A. (1972). Naming and necessity. In D. Davidson & G. Harman (Eds.), Semantics of natural language (pp. 253–355). Springer.
Google Scholar
Kuhr, F., Lichtenberger, M., Braun, T., & M¨oller, R. (2021). Enhancing relational topic models with named entity induced links. 2021 IEEE 15th International Conference on Semantic Computing (ICSC), 314–317. https://doi.org/10.1109/ICSC50631.2021.00059
Google Scholar
Kumar, D., & Singh, S. R. (2019). Prioritized named entity driven LDA for document clustering. In B. Deka, P. Maji, S. Mitra, D. K. Bhattacharyya, P. K. Bora, & S. K. Pal (Eds.), Pattern recognition and machine intelligence (pp. 294–301). Springer International Publishing. https://doi.org/10.1007/978-3-030-34872-433
Google Scholar
Lau, J. H., Newman, D., & Baldwin, T. (2014). Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Gothenburg, Sweden.
Google Scholar
Liang, J., & Liu, H. (2013). Noun distribution in natural languages. Poznań Studies in Contemporary Linguistics, 49(4), 509–529. https://doi.org/10.1515/psicl-2013-0019
Web of Science ®Google Scholar
Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What is your estimand? defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3), 532–565. https://doi.org/10.1177/00031224211004187
Web of Science ®Google Scholar
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., H¨aussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying lda topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2–3), 93–118. https://doi.org/10.1080/19312458.2018.1430754
Web of Science ®Google Scholar
Marrero, M., Urbano, J., Sanchez-Cuadrado, S., Morato, J., & Gomez-Berbis, J. M. (2013). Named entity recognition: Fallacies, challenges and opportunities. Computer Standards and Interfaces, 35(5), 482–489. https://doi.org/10.1016/j.csi.2012.09.004
Web of Science ®Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 3111–3119. Lake Tahoe, Nevada, USA.
Google Scholar
Mimno, D., & Lee, M. (2014). Low-dimensional embeddings for interpretable anchor-based topic inference. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1319–1328. Doha, Qatar.
Google Scholar
Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. Proceedings of the 2011 conference on empirical methods in natural language processing, 262–272. Edinburgh, Scotland, UK.
Google Scholar
Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3–26. https://doi.org/10.1075/li.30.1.03nad
Google Scholar
Nouvel, D., Ehrmann, M., & Rosset, S. (2016). Named entities for computational linguistics. John Wiley & Sons, Inc. https://doi.org/10.1002/9781119268567
Google Scholar
Rehurek, R. & Sojka, P. (2011). Gensim—statistical semantics in python. Retrieved from Genism
Google Scholar
Rijcken, E., Kaymak, U., Scheepers, F., Mosteiro, P., Zervanou, K., & Spruit, M. (2022). Topic modeling for interpretable text classification from ehrs. Frontiers in Big Data, 5. https://doi.org/10.3389/fdata.2022.846930
PubMedGoogle Scholar
Roberts, M. E., Stewart, B. M., & Tingley, D. (2019). Stm: An r package for structural topic models. Journal of Statistical Software, 91(1), 1–40. https://doi.org/10.18637/jss.v091.i02
Google Scholar
Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082. https://doi.org/10.1111/ajps.12103
Web of Science ®Google Scholar
Sievert, C., & Shirley, K. (2014). Ldavis: A method for visualizing and interpreting topics. Proceedings of the workshop on interactive language learning, visualization, and interfaces, 63–70. Baltimore, Maryland, USA.
Google Scholar
Taddy, M. (2012). On estimation and selection for topic models. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics. La Palma, Canary Islands.
Google Scholar
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bu¨rkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved r for assessing convergence of mcmc (with discussion). Bayesian Anal- Ysis, 16(2), 667–718. https://doi.org/10.1214/20-BA1221
Google Scholar
Ying, L., Montgomery, J. M., & Stewart, B. M. (2021). Topics, concepts, and measurement: A crowdsourced procedure for validating topics as measures. Political Analysis, 30(4), 570–589. https://doi.org/10.1017/pan.2021.33
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

What’s in a name? The effect of named entities on topic modelling interpretability

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

What’s in a name? The effect of named entities on topic modelling interpretability

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date