172
Views
0
CrossRef citations to date
0
Altmetric
Research Article

The Hessian by blocks for neural network by backward propagation

& ORCID Icon
Article: 2327102 | Received 18 May 2023, Accepted 01 Mar 2024, Published online: 23 Apr 2024

References

  • McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys. 1943;5:115–133. doi: 10.1007/BF02478259
  • Haohan W, Bhiksha R. On the origin of deep learning. 2017. doi: 10.48550/arXiv.1702.07800
  • Matouq M, El-Hasan T, Duheisat S, et al. The climate change implication on Jordan: A case study using GIS and artificial neural networks for weather forecasting. J Taibah Univ Sci. 2013;7(2):44–55. doi: 10.1016/j.jtusci.2013.04.001
  • Bottou L. 2012. Stochastic gradient descent tricks. In: Montavon G, Orr GB, Klaus-Robert M, editors. Neural networks: tricks of the trade. 2nd ed. Berlin, Heidelberg: Springer Berlin Heidelberg; 2012. p. 421–436.
  • Dogo EM, Afolabi OJ, Nwulu NI, et al. A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In: 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 2018, pp. 92–99. doi: 10.1109/CTEMS.2018.8769211
  • Zeng J, Lau TT-K, Lin S, et al. Global convergence of block coordinate descent in deep learning. In: Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, 2019. p. 7313–7323.
  • LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–2324. doi: 10.1109/5.726791
  • Bollapragada R, Nocedal J, Mudigere D, et al. A progressive batching L-BFGS method for machine learning. In: International Conference on Machine Learning, Stockholm, 2018. p. 620–629.
  • Goldfarb D, Ren Y, Bahamou A. Practical quasi-newton methods for training deep neural networks. Adv Neural Inf Process Syst. 2020;33:2386–2396.
  • Byrd RH, Chin GM, Neveitt W, et al. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J Optim. 2011;21:977–995. doi: 10.1137/10079923X
  • Botev A, Ritter H, Barber D. Practical Gauss-Newton optimisation for deep learning. In: Proceedings of the 34th International Conference on Machine Learning, Sydney, Vol. 70. 2017. p. 557–565.
  • Martens J. Deep learning via Hessian-free optimization. In: IICML'10: Proceedings of the 27th International Conference on International Conference on Machine Learning, Haifa, 2010. p. 735–742.
  • Martens J, Sutskever I, Swersky K. Estimating the Hessian by back-propagating curvature. 2012. Preprint, arXiv:1206.6464.
  • Bishop C. A fast procedure for retraining the multilayer perceptron. Int J Neural Syst. 1994;2:229–236. doi: 10.1142/S0129065791000212
  • Bishop C. Exact calculation of the Hessian matrix for the multilayer perceptron. Neural Comput. 1992;4:494–501. doi: 10.1162/neco.1992.4.4.494
  • Møller MF. Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in o(n) time. In: DAIMI Report Series. 1993.
  • Madi M, Laroussi I. Rate of complete second-order moment convergence and theoretical applications. J Taibah Univ Sci. 2022;16:566–574. doi: 10.1080/16583655.2022.2082179
  • Zhou B-C, Han C-Y, Guo T-D. Convergence of stochastic gradient descent in deep neural network. Acta Math Appl Sinica English Ser. 2021;37:126–136. doi: 10.1007/s10255-021-0991-2
  • Roosta-Khorasani F, Mahoney MW. Sub-sampled newton methods I: Globally convergent algorithms. 2016. arxiv.org/abs/1601.04737.
  • Kingma D, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, 2015.