102
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Improving the Walktrap Algorithm Using K-Means Clustering

, &
Pages 266-288 | Published online: 15 Feb 2024
 

Abstract

The walktrap algorithm is one of the most popular community-detection methods in psychological research. Several simulation studies have shown that it is often effective at determining the correct number of communities and assigning items to their proper community. Nevertheless, it is important to recognize that the walktrap algorithm relies on hierarchical clustering because it was originally developed for networks much larger than those encountered in psychological research. In this paper, we present and demonstrate a computational alternative to the hierarchical algorithm that is conceptually easier to understand. More importantly, we show that better solutions to the sum-of-squares optimization problem that is heuristically tackled by hierarchical clustering in the walktrap algorithm can often be obtained using exact or approximate methods for K-means clustering. Three simulation studies and analyses of empirical networks were completed to assess the impact of better sum-of-squares solutions.

Data availability

The MATLAB files for implementing both the Ward’s and K-means versions of the walktrap algorithm and reproducing the result of Simulations 1, 2, and 3 are available from: https://figshare.com/articles/software/Walktrap_Using_Kmeans_Clustering/21183454

Rieck et al. (Citation2021a) describe the functional connectivity matrices analyzed herein and provide a direct URL to obtain them: https://osf.io/m5crs/

The authors would like to thank the Associate Editor, Sacha Epskamp, and two anonymous reviewers for their comments on prior versions of this manuscript. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors’ institutions or the National Institute On Alcohol Abuse And Alcoholism of the National Institutes of Health is not intended and should not be inferred.

Article information

Conflict of interest disclosures: Each author signed a form for disclosure of potential conflicts of interest. No authors reported any financial or other conflicts of interest in relation to the work described.

Ethical principles: The authors affirm having followed professional ethical guidelines in preparing this work. These guidelines include obtaining informed consent from human participants, maintaining ethical treatment and respect for the rights of human or animal participants, and ensuring the privacy of participants and their data, such as ensuring that individual participants cannot be identified in reported results or from publicly available original or archival data.

Funding: This work (ALW) was supported by Grant K99AA028306 (Principal Investigator: Watts) from the National Institute On Alcohol Abuse And Alcoholism of the National Institutes of Health.

Role of the funders/sponsors: None of the funders or sponsors of this research had any role in the design and conduct of the study; collection, management, analysis, and interpretation of data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.

Acknowledgments: The authors would like to thank the Associate Editor, Sacha Epskamp, and two anonymous reviewers for their comments on prior versions of this manuscript. The ideas and opinions expressed herein are those of the authors alone, and endorsement by the authors’ institutions or the National Institute On Alcohol Abuse And Alcoholism of the National Institutes of Health is not intended and should not be inferred.

Notes

1 In fact, much of the presentation in the Pons and Latapy (Citation2006) article emphasizes efficient implementation of the method for very large networks.

2 There were 49 instances where the two methods yielded the same ARI.

3 There were 18 instances where the two methods yielded the same ARI.

4 Although factors and communities are not the same thing, in the context of this simulation study, the number of communities is equal to the number of factors in the experimental design.

5 If Γ is not positive definite, then Λ is discarded and redrawn.

6 In the computation of E and Q in Equations 10-11, it is assumed that we are working with A and dii values prior to the addition of loops. In other words, unlike the walktrap algorithm that requires addition of loops, they are not required and probably not appropriate to consider for modularity.

7 We use the absolute values of the elements of the correlation matrices for the walktrap algorithm.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 352.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.