522
Views
0
CrossRef citations to date
0
Altmetric
Sparse Learning

Scalable Model-Free Feature Screening via Sliced-Wasserstein Dependency

, ORCID Icon & ORCID Icon
Pages 1501-1511 | Received 20 Oct 2022, Accepted 16 Jan 2023, Published online: 12 Apr 2023

References

  • Arjovsky, M., Chintala, S., and Bottou, L. (2017), “Wasserstein Generative Adversarial Networks,” in International Conference on Machine Learning, pp. 214–223.
  • Bachman, P., Hjelm, R. D., and Buchwalter, W. (2019), “Learning Representations by Maximizing Mutual Information Across Views,” Advances in Neural Information Processing Systems (Vol. 32).
  • Belghazi, M. I., Baratin, A., Rajeshwar, S., Ozair, S., Bengio, Y., Courville, A., and Hjelm, D. (2018), “Mutual Information Neural Estimation,” in International Conference on Machine Learning, pp. 531–540. PMLR.
  • Bonneel, N., Rabin, J., Peyré, G., and Pfister, H. (2015), “Sliced and Radon Wasserstein Barycenters of Measures,” Journal of Mathematical Imaging and Vision, 51, 22–45. DOI: 10.1007/s10851-014-0506-3.
  • Carriere, M., Cuturi, M., and Oudot, S. (2017), “Sliced Wasserstein Kernel for Persistence Diagrams,” in International Conference on Machine Learning, pp. 664–673. PMLR.
  • Chen, L., and Huang, J. Z. (2012), “Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection,” Journal of the American Statistical Association, 107, 1533–1545. DOI: 10.1080/01621459.2012.734178.
  • Chizat, L., Roussillon, P., Léger, F., Vialard, F.-X., and Peyré, G. (2020), “Faster Wasserstein Distance Estimation with the Sinkhorn Divergence,” in Advances in Neural Information Processing Systems (Vol. 33), pp. 2257–2269.
  • Chun, H., and Keleş, S. (2010), “Sparse Partial Least Squares Regression for Simultaneous Dimension Reduction and Variable Selection,” Journal of the Royal Statistical Society, Series B, 72, 3–25. DOI: 10.1111/j.1467-9868.2009.00723.x.
  • Cui, H., Li, R., and Zhong, W. (2015), “Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis,” Journal of the American Statistical Association, 110, 630–641. DOI: 10.1080/01621459.2014.920256.
  • Dai, C., Lin, B., Xing, X., and Liu, J. S. (2022), “False Discovery Rate Control via Data Splitting,” Journal of the American Statistical Association, 1–38 (just-accepted), DOI: 10.1080/01621459.2022.2060113.
  • Deshpande, I., Hu, Y.-T., Sun, R., Pyrros, A., Siddiqui, N., Koyejo, S., Zhao, Z., Forsyth, D., and Schwing, A. G. (2019), “Max-Sliced Wasserstein Distance and its Use for Gans,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10648–10656.
  • Fan, J., Feng, Y., and Song, R. (2011), “Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models,” Journal of the American Statistical Association, 106, 544–557. DOI: 10.1198/jasa.2011.tm09779.
  • Fan, J., and Lv, J. (2008), “Sure Independence Screening for Ultrahigh Dimensional Feature Space,” Journal of the Royal Statistical Society, Series B, 70, 849–911. DOI: 10.1111/j.1467-9868.2008.00674.x.
  • Fan, J., and Ren, Y. (2006), “Statistical Analysis of DNA Microarray Data in Cancer Research,” Clinical Cancer Research, 12, 4469–4473. DOI: 10.1158/1078-0432.CCR-06-1033.
  • Fan, J., Samworth, R., and Wu, Y. (2009), “Ultrahigh Dimensional Feature Selection: Beyond the Linear Model,” Journal of Machine Learning Research, 10, 2013–2038.
  • Fan, J., and Song, R. (2010), “Sure Independence Screening in Generalized Linear Models with NP-Dimensionality,” The Annals of Statistics, 38, 3567–3604. DOI: 10.1214/10-AOS798.
  • Genevay, A., Chizat, L., Bach, F., Cuturi, M., and Peyré, G. (2019), “Sample Complexity of Sinkhorn Divergences,” in The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1574–1583. PMLR.
  • Hall, P., and Miller, H. (2009), “Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems,” Journal of Computational and Graphical Statistics, 18, 533–550. DOI: 10.1198/jcgs.2009.08041.
  • He, X., Wang, L., and Hong, H. G. (2013), “Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data,” The Annals of Statistics, 41, 342–369. DOI: 10.1214/13-AOS1087.
  • Huo, X., and Székely, G. J. (2016), “Fast Computing for Distance Covariance,” Technometrics, 58, 435–447. DOI: 10.1080/00401706.2015.1054435.
  • Kantorovich, L. (1942), “On Translation of Mass (in Russian), c r,” Doklady Academy of Sciences of the USSR, 37, 199–201.
  • Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., and Rohde, G. (2019), “Generalized Sliced Wasserstein Distances,” in Advances in Neural Information Processing Systems (Vol. 32), pp. 261–272.
  • Kolouri, S., Pope, P. E., Martin, C. E., and Rohde, G. K. (2018), “Sliced Wasserstein Auto-Encoders,” in International Conference on Learning Representations.
  • Kolouri, S., Zou, Y., and Rohde, G. K. (2016), “Sliced Wasserstein Kernels for Probability Distributions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5258–5267.
  • Kong, Y., Li, D., Fan, Y., and Lv, J. (2017), “Interaction Pursuit in High-Dimensional Multi-Response Regression via Distance Correlation,” The Annals of Statistics, 45, 897–922. DOI: 10.1214/16-AOS1474.
  • Kullback, S. (1997), Information Theory and Statistics, Chelmsford, MA: Courier Corporation.
  • Levina, E., and Bickel, P. (2001), “The Earth Mover’s Distance is the Mallows Distance: Some Insights from Statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 (Vol. 2), pp. 251–256. IEEE.
  • Li, G., Peng, H., Zhang, J., and Zhu, L. (2012), “Robust Rank Correlation based Screening,” The Annals of Statistics, 40, 1846–1877. DOI: 10.1214/12-AOS1024.
  • Li, R., Chang, C., Justesen, J. M., Tanigawa, Y., Qian, J., Hastie, T., Rivas, M. A., and Tibshirani, R. (2022), “Fast Lasso Method for Large-Scale and Ultrahigh-Dimensional Cox Model with Applications to UK Biobank,” Biostatistics, 23, 522–540. DOI: 10.1093/biostatistics/kxaa038.
  • Li, R., Zhong, W., and Zhu, L. (2012), “Feature Screening via Distance Correlation Learning,” Journal of the American Statistical Association, 107, 1129–1139. DOI: 10.1080/01621459.2012.695654.
  • Li, T., Meng, C., Yu, J., and Xu, H. (2022), “Hilbert Curve Projection Distance for Distribution Comparison,” arXiv preprint arXiv:2205.15059.
  • Liu, J., Li, R., and Wu, R. (2014), “Feature Selection for Varying Coefficient Models with Ultrahigh-Dimensional Covariates,” Journal of the American Statistical Association, 109, 266–274. DOI: 10.1080/01621459.2013.850086.
  • Liu, J., Zhong, W., and Li, R. (2015), “A Selective Overview of Feature Screening for Ultrahigh-Dimensional Data,” Science China Mathematics, 58, 1–22. DOI: 10.1007/s11425-015-5062-9.
  • Liu, W., Ke, Y., Liu, J., and Li, R. (2020), “Model-Free Feature Screening and FDR Control with Knockoff Features,” Journal of the American Statistical Association, 117, 428–443. DOI: 10.1080/01621459.2020.1783274.
  • Liu, W., and Li, R. (2020), “Variable Selection and Feature Screening,” in Macroeconomic Forecasting in the Era of Big Data (Vol. 52), ed. P. Fuleky, pp. 293–326, Cham: Springer.
  • Lv, J., and Liu, J. S. (2014), “Model Selection Principles in Misspecified Models,” Journal of the Royal Statistical Society, Series B, 76, 141–167. DOI: 10.1111/rssb.12023.
  • Mai, Q., and Zou, H. (2015), “The Fused Kolmogorov Filter: A Nonparametric Model-Free Screening Method,” The Annals of Statistics, 43, 1471–1497. DOI: 10.1214/14-AOS1303.
  • Meng, C., Ke, Y., Zhang, J., Zhang, M., Zhong, W., and Ma, P. (2019), “Large-Scale Optimal Transport Map Estimation Using Projection Pursuit,” in Advances in Neural Information Processing Systems (Vol. 32).
  • Mordant, G., and Segers, J. (2022), “Measuring Dependence between Random Vectors via Optimal Transport,” Journal of Multivariate Analysis, 189, 104912. DOI: 10.1016/j.jmva.2021.104912.
  • Nadjahi, K. (2021), “Sliced-Wasserstein Distance for Large-Scale Machine Learning: Theory, Methodology and Extensions,” Ph. D. thesis, Institut polytechnique de Paris.
  • Nadjahi, K., De Bortoli, V., Durmus, A., Badeau, R., and Şimşekli, U. (2020), “Approximate Bayesian Computation with the Sliced-Wasserstein Distance,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5470–5474. IEEE.
  • Nguyen, K., Ho, N., Pham, T., and Bui, H. (2020), “Distributional Sliced-Wasserstein and Applications to Generative Modeling,” arXiv preprint arXiv:2002.07367.
  • Nies, T. G., Staudt, T., and Munk, A. (2021), “Transport Dependency: Optimal Transport based Dependency Measures,” arXiv preprint arXiv:2105.02073.
  • Ozair, S., Lynch, C., Bengio, Y., van den Oord, A., Levine, S., and Sermanet, P. (2019), “Wasserstein Dependency Measure for Representation Learning,” Advances in Neural Information Processing Systems (Vol. 32), pp. 15604–15614.
  • Pan, W., Wang, X., Xiao, W., and Zhu, H. (2018), “A Generic Sure Independence Screening Procedure,” Journal of the American Statistical Association, 928–937. DOI: 10.1080/01621459.2018.1462709.
  • Panaretos, V. M., and Zemel, Y. (2019), “Statistical Aspects of Wasserstein Distances,” Annual Review of Statistics and its Application, 6, 405–431. DOI: 10.1146/annurev-statistics-030718-104938.
  • Pang, G., Cao, L., Chen, L., and Liu, H. (2018), “Learning Representations of Ultrahigh-Dimensional Data for Random Distance-based Outlier Detection,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2041–2050. DOI: 10.1145/3219819.3220042.
  • Peng, H., Long, F., and Ding, C. (2005), “Feature Selection based on Mutual Information Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 1226–1238. DOI: 10.1109/TPAMI.2005.159.
  • Peyré, G., and Cuturi, M. (2019), “Computational Optimal Transport,” Foundations and Trends[textregistered] in Machine Learning, 11, 355–607. DOI: 10.1561/2200000073.
  • Rabin, J., Peyré, G., Delon, J., and Bernot, M. (2011), “Wasserstein Barycenter and its Application to Texture Mixing,” in International Conference on Scale Space and Variational Methods in Computer Vision, pp. 435–446, Springer.
  • Rowland, M., Hron, J., Tang, Y., Choromanski, K., Sarlós, T., and Weller, A. (2019), “Orthogonal Estimation of Wasserstein Distances,” in The 22nd International Conference on Artificial Intelligence and Statistics, pp. 186–195. PMLR.
  • Segal, M. R., Dahlquist, K. D., and Conklin, B. R. (2003), “Regression Approaches for Microarray Data Analysis,” Journal of Computational Biology, 10, 961–980. DOI: 10.1089/106652703322756177.
  • Shao, X., and Zhang, J. (2014), “Martingale Difference Correlation and its Use in High-Dimensional Variable Screening,” Journal of the American Statistical Association, 109, 1302–1318. DOI: 10.1080/01621459.2014.887012.
  • Spyromitros-Xioufis, E., Tsoumakas, G., Groves, W., and Vlahavas, I. (2016), “Multi-Target Regression via Input Space Expansion: Treating Targets as Inputs,” Machine Learning, 104, 55–98. DOI: 10.1007/s10994-016-5546-z.
  • Stone, J. V. (2004), Independent Component Analysis: A Tutorial Introduction, Cambridge, MA: MIT Press.
  • Székely, G. J., and Rizzo, M. L. (2014), “Partial Distance Correlation with Methods for Dissimilarities,” The Annals of Statistics, 42, 2382–2412. DOI: 10.1214/14-AOS1255.
  • Székely, G. J., Rizzo, M. L., and Bakirov, N. K. (2007), “Measuring and Testing Dependence by Correlation of Distances,” The Annals of Statistics, 35, 2769–2794. DOI: 10.1214/009053607000000505.
  • Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2003), “Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays,” Statistical Science, 104–117. DOI: 10.1214/ss/1056397488.
  • Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. (2009), “Feature Hashing for Large Scale Multitask Learning,” in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1113–1120. DOI: 10.1145/1553374.1553516.
  • Wiesel, J. C. (2022), “Measuring Association with Wasserstein Distances,” Bernoulli, 28, 2816–2832. DOI: 10.3150/21-BEJ1438.
  • Wu, J., Huang, Z., Acharya, D., Li, W., Thoma, J., Paudel, D. P., and Gool, L. V. (2019), “Sliced Wasserstein Generative Models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3713–3722.
  • Wu, Y., and Yin, G. (2015), “Conditional Quantile Screening in Ultrahigh-Dimensional Heterogeneous Data,” Biometrika, 102, 65–76. DOI: 10.1093/biomet/asu068.
  • Xu, H., Luo, D., Henao, R., Shah, S., and Carin, L. (2020), “Learning Autoencoders with Relational Regularization,” in International Conference on Machine Learning, pp. 10576–10586. PMLR.
  • Xu, K., Shen, Z., Huang, X., and Cheng, Q. (2020), “Projection Correlation between Scalar and Vector Variables and its Use in Feature Screening with Multi-Response Data,” Journal of Statistical Computation and Simulation, 90, 1923–1942. DOI: 10.1080/00949655.2020.1753057.
  • Xue, L., and Zou, H. (2011), “Sure Independence Screening and Compressed Random Sensing,” Biometrika, 98, 371–380. DOI: 10.1093/biomet/asr010.
  • Yan, X., Tang, N., and Zhao, X. (2017), “The Spearman Rank Correlation Screening for Ultrahigh Dimensional Censored Data,” arXiv preprint arXiv:1702.02708.
  • Zhu, L.-P., Li, L., Li, R., and Zhu, L.-X. (2011), “Model-Free Feature Screening for Ultrahigh-Dimensional Data,” Journal of the American Statistical Association, 106, 1464–1475. DOI: 10.1198/jasa.2011.tm10563.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.