Abstract
High-dimensional data pose unique challenges for data processing in an era of ever-increasing amounts of data availability. Graph theory can provide a structure of high-dimensional data. We introduce two key properties desirable for graphs in testing homogeneity. Roughly speaking, these properties may be described as: unboundedness of edge counts under the same distribution and boundedness of edge counts under different distributions. It turns out that the minimum spanning tree violates these properties but the shortest Hamiltonian path posses them. Based on the shortest Hamiltonian path, we propose two combinations of edge counts in multiple samples to test for homogeneity. We give the permutation null distributions of proposed statistics when sample sizes go to infinity. The power is analyzed by assuming both sample sizes and dimensionality tend to infinity. Simulations show that our new tests behave very well overall in comparison with various competitors. Real data analysis of tumors and images further convince the value of our proposed tests. Software implementing the test is available in the R package GRelevance. Supplemental materials for this article are available online.
Supplementary Materials
Supplementary materials include: Appendix A for proof of Theorem 1, Appendix B for proof of Theorem 2, Appendix C for verification of A4, Appendix D for verification of A5, Appendix E for presenting Lemma 1 and its proof, Appendix F for proof of Theorem 3, and a zip file containing R code for simulation and data analysis.
Acknowledgments
We thank Dr. Won Beom Jung for sharing fMRI data. We also thank Dr. Nancy Reid, Dr. Augustine Wong, Octavia Wong, and Jessica Collins for their helpful comments on the first draft. We would like to thank the two referees for suggesting some notable improvements in the presentation of the article.
Disclosure statement
The author reports there are no competing interests to declare.