Abstract
Algorithmic feature learners provide high-dimensional vector representations for non-matrix structured data, like image or text collections. Low-dimensional projections derived from these representations, called embeddings, are often used to explore variation in these data. However, it is not clear how to assess the embedding uncertainty. We adapt methods developed for bootstrapping principal components analysis to the setting where features are algorithmically derived from nonmatrix data. We empirically compare the derived confidence areas in simulations, varying factors influencing feature learning and the bootstrap, like feature learning algorithm complexity and bootstrap sample size. We illustrate the proposed approaches on a spatial proteomics dataset, where we observe that embedding precision is not uniform across all tissue types. Code, data, and pretrained models are available as an R compendium in the supplementary materials. Supplementary files for this article are available online.
Supplementary Materials
Appendix: A PDF with additional supporting materials. Includes sections describing details of the spatial point process simulation setup and explaining how to access data and reproduce all analysis. Also provides supplementary figures referred to within the main manuscript. (supplement.pdf, PDF document)
Acknowledgments
The author thanks Susan Holmes, Karl Rohe, three reviewers, the associate editor, and the editor for feedback which improved the manuscript. Research was performed with assistance of the UW-Madison Center For High Throughput Computing (CHTC).