387
Views
2
CrossRef citations to date
0
Altmetric
Survival Analysis

Approximating Partial Likelihood Estimators via Optimal Subsampling

, , ORCID Icon &
Pages 276-288 | Received 06 Oct 2022, Accepted 14 May 2023, Published online: 30 Jun 2023

References

  • Ai, M., Yu, J., Zhang, H., and Wang, H. (2021), “Optimal Subsampling Algorithms for Big Data Regressions,” Statistica Sinica, 31, 749–772. DOI: 10.5705/ss.202018.0439.
  • Andersen, P. K., and Gill, R. D. (1982), “Cox’s Regression Model for Counting Processes: A Large Sample Study,” The Annals of Statistics, 10, 1100–1120. DOI: 10.1214/aos/1176345976.
  • Atkinson, A., Donev, A., and Tobias, R. (2007), Optimum Experimental Designs, with SAS, Oxford: Oxford University Press.
  • Bai, Y., Li, C., Lin, Z., Wu, Y., Miao, Y., Liu, Y., and Xu, Y. (2021), “Efficient Data Loader for Fast Sampling-based gnn Training on Large Graphs,”. IEEE Transactions on Parallel and Distributed Systems. 32, 2541–2556. DOI: 10.1109/TPDS.2021.3065737.
  • Battey, H., Fan, J., Liu, H., Lu, J., and Zhu, Z. (2018), “Distributed Testing and Estimation Under Sparse High Dimensional Models,” The Annals of Statistics 46, 1352–1382. DOI: 10.1214/17-AOS1587.
  • Chen, X., Cheng, J. Q., and Xie, M. (2021), “Divide-and-Conquer Methods for Big Data Analysis,” arXiv:2102.10771v1.
  • Chen, X., Liu, W., and Zhang, Y. (2022), “First-Order Newton-Type Estimator for Distributed Estimation and Inference,” Journal of the American Statistical Association, 117, 1858–1874. DOI: 10.1080/01621459.2021.1891925.
  • Cox, D. R. (1972), “Regression Models and Life-Tables,” (with Discussions), Journal of the Royal Statistical Society, Series B, 34, 187–220. DOI: 10.1111/j.2517-6161.1972.tb00899.x.
  • Cox, D. R. (1975), “Partial Likelihood,” Biometrika, 62, 269–276.
  • DVN. (2008), “Data Expo 2009: Airline on Time Data.”
  • Fan, J., Guo, Y., and Wang, K. (2023), “Communication-Efficient Accurate Statistical Estimation,” Journal of the American Statistical Association, 118, 1000–1010. DOI: 10.1080/01621459.2021.1969238.
  • Fang, E. X., Ning, Y., and Liu, H. (2017), “Testing and Confidence Intervals for High Dimensional Proportional Hazards Models,” Journal of the Royal Statistical Society, Series B, 79, 1415–1437. DOI: 10.1111/rssb.12224.
  • Fleming, T., and Harrington, D. (1991), Counting Processes and Survival Analysis, New York: Wiley.
  • Han, L., Tan, K. M., Yang, T., and Zhang, T. (2020), “Local Uncertainty Sampling for Large-Scale Multiclass Logistic Regression,” The Annals of Statistics, 48, 1770–1788. DOI: 10.1214/19-AOS1867.
  • Hesterberg, T. (1995), “Weighted Average Importance Sampling and Defensive Mixture Distributions,” Technometrics, 37, 185–194. DOI: 10.1080/00401706.1995.10484303.
  • Huang, J., Sun, T., Ying, Z., Yu, Y., and Zhang, C.-H. (2013), “Oracle Inequalities for the Lasso in the Cox Model,” The Annals of Statistics, 41, 1142–1165. DOI: 10.1214/13-AOS1098.
  • Jordan, M. I., Lee, J. D., and Yang, Y. (2019), “Communication-Efficient Distributed Statistical Inference,” Journal of the American Statistical Association, 114, 668–681. DOI: 10.1080/01621459.2018.1429274.
  • Kalbfleisch, J. D., and Prentice, R. L. (2002), The Statistical Analysis of Failure Time Data, Hoboken, NJ: Wiley-Interscience.
  • Keret, N., and Gorfine, M. (2020), “Optimal Cox Regression Subsampling Procedure with Rare Events,” arXiv:2012.02122v1. DOI: 10.1080/01621459.2023.2209349.
  • Kiefer, J. (1959), “Optimum Experimental Designs,” Journal of the Royal Statistical Society, Series B, 21, 272–319. DOI: 10.1111/j.2517-6161.1959.tb00338.x.
  • Kleinbaum, D. G., and Klein, M. (2005), Survival Analysis: A Self-Learning Text, New York: Springer.
  • Lee, S., and Ng, S. (2020), “An Econometric Perspective on Algorithmic Subsampling,” Annual Review of Economics, 12, 45–80. DOI: 10.1146/annurev-economics-022720-114138.
  • Li, R., Chang, C., Justesen, J. M., Tanigawa, Y., Qiang, J., Hastie, T., Rivas, M. A., and Tibshirani, R. (2022), “Fast Lasso Method for Large-Scale and Ultrahigh-Dimensional Cox Model with Applications to UK Biobank,” Biostatistics, 23, 522–540. DOI: 10.1093/biostatistics/kxaa038.
  • Li, T., and Meng, C. (2021), “Modern Subsampling Methods for Large-Scale Least Squares Regression,” International Journal of Cyber-Physical Systems, 2, 1–28. DOI: 10.4018/IJCPS.2020070101.
  • Lin, L., Li, W., and Lu, J. (2020), “Unified Rules of Renewable Weighted Sums for Various Online Updating Estimations,” arXiv:2008.08824v1.
  • Liu, H., You, J., and Cao, J. (2021), “Functional L-optimality Subsampling for Massive Data,” arXiv:2104.03446v1.
  • Luo, L., and Song, P. X. (2020), “Renewable Estimation and Incremental Inference in Generalized Linear Models with Streaming Data Sets,” Journal of The Royal Statistical Society, Series B, 82, 69–97. DOI: 10.1111/rssb.12352.
  • Luo, L., Zhou, L., and Song, P. X.-K. (2022), “Real-Time Regression Analysis of Streaming Clustered Data with Possible Abnormal Data Batches,” Journal of the American Statistical Association. DOI: 10.1080/01621459.2022.2026778.
  • Ma, P., Mahoney, M. W., and Yu, B. (2015), “A Statistical Perspective on Algorithmic Leveraging,” Journal of Machine Learning Research, 16, 861–911.
  • Meng, C., Xie, R., Mandal, A., Zhang, X., Zhong, W., and Ma, P. (2021), “Lowcon: A Design Based Subsampling Approach in a Misspecified Linear Model,” Journal of Computational and Graphical Statistics, 30, 694–708. DOI: 10.1080/10618600.2020.1844215.
  • Owen, A., and Zhou, Y. (2000), “Safe and Effective Importance Sampling,” Journal of the American Statistical Association, 95, 135–143. DOI: 10.1080/01621459.2000.10473909.
  • R Core Team. (2021), R: A Language and Environment for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing.
  • Schifano, E. D., Wu, J., Wang, C., Yan, J., and Chen, M.-H. (2016), “Online Updating of Statistical Inference in the Big Data Setting,” Technometrics, 58, 393–403. DOI: 10.1080/00401706.2016.1142900.
  • Shi, C., Lu, W., and Song, R. (2018), “A Massive Data Framework for m-estimators with Cubic-Rate,” Journal of the American Statistical Association, 113, 1698–1709. DOI: 10.1080/01621459.2017.1360779.
  • Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2011), “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent,” Journal of Statistical Software, 39, 1–13. DOI: 10.18637/jss.v039.i05.
  • Tarkhan, A., and Simon, N. (2020), “Bigsurvsgd: Big Survival Data Analysis via Stochastic Gradient Descent,” arXiv:2003.00116v1.
  • Therneau, T. M. (2021), A Package for Survival Analysis in R. R package version 3.2-13.
  • Therneau, T. M., Grambsch, P. M., and Fleming, T. R. (1990), “Martingale-based Residuals for Survival Models,” Biometrika, 77, 147–160. DOI: 10.1093/biomet/77.1.147.
  • Volgushev, S., Chao, S.-K., and Cheng, G. (2019), “Distributed Inference for Quantile Regression Processes,” The Annals of Statistics, 47, 1634–1662. DOI: 10.1214/18-AOS1730.
  • Wang, C., Chen, M.-H., Schifano, E., Wu, J., and Yan, J. (2016), “Statistical Methods and Computing for Big Data,” Statistics and Its Interface, 9, 399–414. DOI: 10.4310/SII.2016.v9.n4.a1.
  • Wang, H. (2019), “More Efficient Estimation for Logistic Regression with Optimal Subsamples,” Journal of Machine Learning Research, 20, 1–59.
  • Wang, H., and Ma, Y. (2021), “Optimal Subsampling for Quantile Regression in Big Data,” Biometrika, 108, 99–112. DOI: 10.1093/biomet/asaa043.
  • Wang, H., Yang, M., and Stufken, J. (2019), “Information-based Optimal Subdata Selection for Big Data Linear Regression,” Journal of the American Statistical Association, 114, 393–405. DOI: 10.1080/01621459.2017.1408468.
  • Wang, H., Zhu, R., and Ma, P. (2018), “Optimal Subsampling for Large Sample Logistic Regression,” Journal of the American Statistical Association, 113, 829–844. DOI: 10.1080/01621459.2017.1292914.
  • Wang, J., Zou, J., and Wang, H. (2022), “Sampling with Replacement vs Poisson Sampling: A Comparative Study in Optimal Subsampling,” IEEE Transactions on Information Theory, 68, 6605–6630. DOI: 10.1109/TIT.2022.3176955.
  • Wang, K., Wang, H., and Li, S. (2022), “Renewable Quantile Regression for Streaming Datasets,” Knowledge-Based Systems, 235, 107675. DOI: 10.1016/j.knosys.2021.107675.
  • Wang, T., and Zhang, H. (2022), “Optimal Subsampling for Multiplicative Regression with Massive Data,” Statistica Neerlandica, 76, 418–449. DOI: 10.1111/stan.12266.
  • Wang, Y., Hong, C., Palmer, N., Di, Q., Schwartz, J., Kohane, I., and Cai, T. (2021), “A Fast Divide-and-Conquer Sparse Cox Regression,” Biostatistics, 22, 381–401. DOI: 10.1093/biostatistics/kxz036.
  • Wu, J., Chen, M. H., Schifano, E. D., and Yan, J. (2021), “Online Updating of Survival Analysis,” Journal of Computational and Graphical Statistics, 30, 1209–1223. DOI: 10.1080/10618600.2020.1870481.
  • Xiong, S., and Li, G. (2008), “Some Results on the Convergence of Conditional Distributions,” Statistics and Probability Letters, 78, 3249–3253. DOI: 10.1016/j.spl.2008.06.026.
  • Xu, J., Ying, Z., and Zhao, N. (2020), “Scalable Estimation and Inference with Large-Scale or Online Survival Data,” arXiv preprint arXiv:2001.01434.
  • Xue, Y., Wang, H., Yan, J., and Schifano, E. D. (2019), “An Online Updating Approach for Testing the Proportional Hazards Assumption with Streams of Survival Data,” Biometrics, 76, 171–182. DOI: 10.1111/biom.13137.
  • Yang, Z., Wang, H., and Yan, J. (2022), “Optimal Subsampling for Parametric Accelerated Failure Time Models with Massive Survival Data,” Statistics in Medicine, 41, 5421–5431. DOI: 10.1002/sim.9576.
  • Yao, Y., and Wang, H. (2019), “Optimal Subsampling for Softmax Regression,” Statistical Papers, 60, 585–599. DOI: 10.1007/s00362-018-01068-6.
  • Yao, Y., and Wang, H. (2021), “A Review on Optimal Subsampling Methods for Massive Datasets,” Journal of Data Science, 19, 151–172.
  • Yao, Y., Zou, J., and Wang, H. (2021), “Optimal Poisson Subsampling for Softmax Regression,” Journal of Systems Science and Complexity, accepted.
  • Yu, J., Ai, M., and Ye, Z. (2023), “A Review on Design Inspired Subsampling for Big Data,” Statistical Papers. DOI: 10.1007/s00362-022-01386-w.
  • Yu, J., Wang, H., Ai, M., and Zhang, H. (2022), “Optimal Distributed Subsampling for Maximum Quasi-Likelihood Estimators with Massive Data,” Journal of the American Statistical Association, 117, 265–76. DOI: 10.1080/01621459.2020.1773832.
  • Zhang, A., Zhang, H., and Yin, G. (2020), “Adaptive Iterative Hessian Sketch via a-optimal Subsampling,” Statistics and Computing, 30, 1075–1090. DOI: 10.1007/s11222-020-09936-8.
  • Zhang, H., and Wang, H. (2021), “Distributed Subdata Selection for Big Data via Sampling-based Approach,” Computational Statistics & Data Analysis, 153, 107072. DOI: 10.1016/j.csda.2020.107072.
  • Zhang, T., Ning, Y., and Ruppert, D. (2021), “Optimal Sampling for Generalized Linear Models Under Measurement Constraints,” Journal of Computational and Graphical Statistics, 30, 106–114. DOI: 10.1080/10618600.2020.1778483.
  • Zhao, T., Cheng, G., and Liu, H. (2016), “A Partially Linear Framework for Massive Heterogeneous Data,” The Annals of Statistics, 44, 1400–1437. DOI: 10.1214/15-AOS1410.
  • Zuo, L., Zhang, H., Wang, H., and Liu, L. (2021a), “Sampling-based Estimation for Massive Survival Data with Additive Hazards Model,” Statistics in Medicine, 40, 441–450. DOI: 10.1002/sim.8783.
  • Zuo, L., Zhang, H., Wang, H., and Sun, L. (2021b), “Optimal Subsample Selection for Massive Logistic Regression with Distributed Data,” Computational Statistics, 36, 2535–2562. DOI: 10.1007/s00180-021-01089-0.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.