272
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Group Selection and Shrinkage: Structured Sparsity for Semiparametric Additive Models

&
Received 15 Aug 2022, Accepted 05 Mar 2024, Published online: 22 Apr 2024

References

  • Beck, A., and Eldar, Y. C. (2013), “Sparsity Constrained Nonlinear Optimization: Optimality Conditions and Algorithms,” SIAM Journal on Optimization, 23, 1480–1509. DOI: 10.1137/120869778.
  • Beck, A., and Tetruashvili, L. (2013), “On the Convergence of Block Coordinate Descent Type Methods,” SIAM Journal on Optimization, 23, 2037–2060. DOI: 10.1137/120887679.
  • Bertsimas, D., and King, A. (2016), “OR Forum—An Algorithmic Approach to Linear Regression,” Operations Research, 64, 2–16. DOI: 10.1287/opre.2015.1436.
  • Breheny, P., and Huang, J. (2015), “Group Descent Algorithms for Nonconvex Penalized Linear and Logistic Regression Models with Grouped Predictors,” Statistics and Computing, 25, 173–187. DOI: 10.1007/s11222-013-9424-2.
  • Breiman, L. (1996), “Heuristics of Instability and Stabilization in Model Selection,” Annals of Statistics, 24, 2350–2383.
  • Chouldechova, A., and Hastie, T. (2015), “Generalized Additive Model Selection,” arXiv: 1506.03850 https://arxiv.org/abs/1506.03850.
  • De Mol, C., Giannone, D., and Reichlin, L. (2008), “Forecasting Using a Large Number of Predictors: Is Bayesian Shrinkage a Valid Alternative to Principal Components?” Journal of Econometrics, 146, 318–328. DOI: 10.1016/j.jeconom.2008.08.011.
  • Dedieu, A., Hazimeh, H., and Mazumder, R. (2021), “Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives,” Journal of Machine Learning Research, 22, 1–47.
  • Falk, M. (1999), “A Simple Approach to the Generation of Uniformly Distributed Random Variables with Prescribed Correlations,” Communications in Statistics - Simulation and Computation, 28, 785–791. DOI: 10.1080/03610919908813578.
  • Fan, J., and Li, R. (2001), “Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties,” Journal of the American Statistical Association, 96, 1348–1360. DOI: 10.1198/016214501753382273.
  • Guo, Y., Berman, M., and Gao, J. (2014), “Group Subset Selection for Linear Regression,” Computational Statistics and Data Analysis, 75, 39–52. DOI: 10.1016/j.csda.2014.02.005.
  • Guo, Y., Zhu, Z., and Fan, J. (2021), “Best Subset Selection is Robust Against Design Dependence,” arXiv: 2007.01478 https://arxiv.org/abs/2007.01478.
  • Hastie, T., Tibshirani, R., and Tibshirani, R. (2020), “Best Subset, Forward Stepwise or Lasso? Analysis and Recommendations based on Extensive Comparisons,” Statistical Science, 35, 579–592. DOI: 10.1214/19-STS733.
  • Hastie, T., Tibshirani, R., and Wainwright, M. (2015), Statistical Learning with Sparsity. The Lasso and Generalizations, Chapman & Hall/CRC Monographs on Statistics and Applied Probability, Boca Raton, FL: CRC Press.
  • Hazimeh, H., and Mazumder, R. (2020), “Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms,” Operations Research, 68, 1517–1537. DOI: 10.1287/opre.2019.1919.
  • Hazimeh, H., Mazumder, R., and Radchenko, P. (2023), “Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives,” Annals of Statistics, 51, 1–32.
  • Jacob, L., Obozinski, G., and Vert, J.-P. (2009), “Group Lasso with Overlap and Graph Lasso,” in Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. DOI: 10.1145/1553374.1553431.
  • James, G., Witten, D., Hastie, T., and Tibshirani, R. (2021), An Introduction to Statistical Learning (2nd ed), Springer Texts in Statistics, New York: Springer.
  • Ko, S., Li, G. X., Choi, H., and Won, J.-H. (2021), “Computationally Scalable Regression Modeling for Ultrahigh-Dimensional Omics Data with ParProx,” Briefings in Bioinformatics, 22, 1–17. DOI: 10.1093/bib/bbab256.
  • Li, J., and Chen, W. (2014), “Forecasting Macroeconomic Time Series: Lasso-based Approaches and their Forecast Combinations with Dynamic Factor Models,” International Journal of Forecasting, 30, 996–1015. DOI: 10.1016/j.ijforecast.2014.03.016.
  • Lim, M., and Hastie, T. (2015), “Learning Interactions via Hierarchical Group-Lasso Regularization,” Journal of Computational and Graphical Statistics, 24, 627–654. DOI: 10.1080/10618600.2014.938812.
  • Lou, Y., Bien, J., Caruana, R., and Gehrke, J. (2016), “Sparse Partially Linear Additive Models,” Journal of Computational and Graphical Statistics, 25, 1026–1040. DOI: 10.1080/10618600.2015.1089775.
  • Lounici, K., Pontil, M., van de Geer, S., and Tsybakov, A. B. (2011), “Oracle Inequalities and Optimal Inference Under Group Sparsity,” Annals of Statistics, 39, 2164–2204.
  • Mazumder, R., Radchenko, P., and Dedieu, A. (2023), “Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR is low,” Operations Research, 71, 129–147. DOI: 10.1287/opre.2022.2276.
  • McCracken, M. W., and Ng, S. (2016), “FRED-MD: A Monthly Database for Macroeconomic Research,” Journal of Business and Economic Statistics, 34, 574–589. DOI: 10.1080/07350015.2015.1086655.
  • Meier, L., van de Geer, S., and Bühlmann, P. (2008), “The Group Lasso for Logistic Regression,” Journal of the Royal Statistical Society, Series B, 70, 53–71. DOI: 10.1111/j.1467-9868.2007.00627.x.
  • Obozinski, G., Jacob, L., and Vert, J.-P. (2011), “Group Lasso with Overlaps: The Latent Group Lasso Approach,” arXiv: 1110.0413 https://arxiv.org/abs/1110.0413.
  • Obozinski, G., Taskar, B., and Jordan, M. (2006), “Multi-Task Feature Selection,” Technical Report. Available at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.951&rep=rep1&type=pdf.
  • Percival, D. (2012), “Theoretical Properties of the Overlapping Groups Lasso,” Electronic Journal of Statistics, 6, 269–288. DOI: 10.1214/12-EJS672.
  • Raskutti, G., Wainwright, M. J., and Yu, B. (2011), “Minimax Rates of Estimation for High-Dimensional Linear Regression Over ℓq -Balls,” IEEE Transactions on Information Theory, 57, 6976–6994.
  • Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009), “Sparse Additive Models,” Journal of the Royal Statistical Society, Series B, 71, 1009–1030. DOI: 10.1111/j.1467-9868.2009.00718.x.
  • Rigollet, P. (2015), 18.S997: High dimensional statistics. Lecture notes.
  • Simon, N., and Tibshirani, R. (2012), “Standardization and the Group Lasso Penalty,” Statistica Sinica, 22, 983–1001. DOI: 10.5705/ss.2011.075.
  • Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society, Series B, 58, 267–288. DOI: 10.1111/j.2517-6161.1996.tb02080.x.
  • Tseng, P. (2001), “Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization,” Journal of Optimization Theory and Applications, 109, 475–494. DOI: 10.1023/A:1017501703105.
  • Wang, H. (2009), “Forward Regression for Ultra-High Dimensional Variable Screening,” Journal of the American Statistical Association, 104, 1512–1524. DOI: 10.1198/jasa.2008.tm08516.
  • Yuan, M., and Lin, Y. (2006), “Model Selection and Estimation in Regression with Grouped Variables,” Journal of the Royal Statistical Society, Series B, 68, 49–67. DOI: 10.1111/j.1467-9868.2005.00532.x.
  • Zeng, Y., and Breheny, P. (2016), “Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection,” Cancer Informatics, 15, 179–187. DOI: 10.4137/CIN.S40043.
  • Zhang, C.-H. (2010), “Nearly Unbiased Variable Selection Under Minimax Concave Penalty,” Annals of Statistics, 38, 894–942.
  • Zhang, Y., Zhu, J., Zhu, J., and Wang, X. (2023), “A Splicing Approach to Best Subset of Groups Selection,” INFORMS Journal on Computing 35, 104–119. DOI: 10.1287/ijoc.2022.1241.