328
Views
1
CrossRef citations to date
0
Altmetric
Machine Learning

Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator

, &
Pages 1348-1360 | Received 04 Jan 2022, Accepted 22 Mar 2023, Published online: 06 Jun 2023

References

  • Bottou, L., Curtis, F. E., and Nocedal, J. (2018), ‘‘Optimization Methods for Large-Scale Machine Learning,” Siam Review, 60, 223–311. DOI: 10.1137/16M1080173.
  • Cauchy, A.-L. (1847), “Méthode générale pour la résolution des systèmes d’équations simultanées,” Comptes rendus des séances de l’Académie des sciences de Paris, 536–538.
  • Chen, X., Lee, J. D., Tong, X. T., and Zhang, Y. (2016), “Statistical Inference for Model Parameters in Stochastic Gradient Descent,” arXiv preprint arXiv:1610.08637.
  • Chen, Z., Mou, S., and Maguluri, S. T. (2022), “Stationary Behavior of Constant Stepsize SGD Type Algorithms,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6, 1–24. DOI: 10.1145/3508039.
  • Dieuleveut, A., Durmus, A., and Bach, F. (2020), “Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains,” Annals of Statistics, 48, 1348–1382.
  • Duchi, J., Hazan, E., and Singer, Y. (2011), “Adaptive Subgradient Methods for Online Learning and Stochastic Optimization,” Journal of Machine Learning Research, 12, 257–269.
  • Gao, Y., Li, J., Zhou, Y., Xiao, F., and Liu, H. (2021), ‘‘Optimization Methods For Large-Scale Machine Learning,” in 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), pp. 304–308.
  • Gitman, I., Lang, H., Zhang, P., and Xiao, L. (2019), “Understanding the Role of Momentum in Stochastic Gradient Methods,” in Proceedings of 33rd Conference and Workshop on Neural Information Processing Systems.
  • He, K., Zhang, X., Ren, S., and Sun, J. (2016), “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: 10.1109/CVPR.2016.90.
  • Karimi, H., Nutini, J., and Schmidt, M. (2016), “Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition,” Cham: Springer.
  • Kingma, D., and Ba, J. (2014), “Adam: A Method for Stochastic Optimization,” in Proceedings of the International Conference on Learning Representations 2015.
  • Lan, G. (2020), First-Order and Stochastic Optimization Methods for Machine Learning, Cham: Springer.
  • Maas, A. L., Qi, P., Xie, Z., Hannun, A. Y., Lengerich, C. T., Jurafsky, D., and Ng, A. Y. (2017), “Building DNN Acoustic Models for Large Vocabulary Speech Recognition,” Computer Speech and Language, 41,195–213. DOI: 10.1016/j.csl.2016.06.007.
  • Mou, W., Li, C. J., Wainwright, M. J., Bartlett, P. L., and Jordan, M. I. (2020), “On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration,” ArXiv, abs/2004.04719.
  • Nemirovski, A. S., Juditsky, A., Lan, G., and Shapiro, A. (2009), “Robust Stochastic Approximation Approach to Stochastic Programming,” SIAM Journal on Optimization, 19, 1574–1609. DOI: 10.1137/070704277.
  • Nesterov, Y. (2004), Introductory Lectures on Convex Optimization: A Basic Course, New York: Kluwer Academic Publishers.
  • Nocedal, J., and Wright, S. J. (2006), Numerical Optimization (2nd ed.), New York: Springer.
  • Polyak, B. T. (1963), “Gradient Methods for Minimizing Functionals” (in Russian), Zh. Vychisl. Mat. Mat. Fiz., 643–653.
  • Rakhlin, A., Shamir, O., and Sridharan, K. (2012), “Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization,” in International Conference on Machine Learning.
  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., and Bernstein, M. a. (2015), “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, 115, 211–252. DOI: 10.1007/s11263-015-0816-y.
  • Sarao Mannelli, S., and Urbani, P. (2021), “Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems,” in Advances in Neural Information Processing Systems (Vol. 34), eds. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, pp. 187–199.
  • Tieleman, T., and Hinton, G. (2012), “Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude,” in COURSERA: Neural Networks for Machine Learning, pp. 26–31.
  • Toulis, P., and Airoldi, E. M. (2017), “Asymptotic and Finite-Sample Properties of Estimators based on Stochastic Gradients,” Eprint Arxiv, 45, 1694–1727.
  • Xie, Y., Wu, X., and Ward, R. (2020), “Linear Convergence of Adaptive Stochastic Gradient Descent.”
  • Yu, L., Balasubramanian, K., Volgushev, S., and Erdogdu, M. (2021), “An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias,” in Proceedings of 35rd Conference and Workshop on Neural Information Processing Systems.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.