343
Views
0
CrossRef citations to date
0
Altmetric
Machine Learning

Revisiting Convolutional Neural Networks from the Viewpoint of Kernel-Based Methods

ORCID Icon, ORCID Icon & ORCID Icon
Pages 1237-1247 | Received 26 Mar 2021, Accepted 14 Dec 2022, Published online: 16 Feb 2023
 

Abstract

Convolutional neural networks, as most artificial neural networks, are frequently viewed as methods different in essence from kernel-based methods. In this work we translate several classical convolutional neural networks into kernel-based counterparts. Each kernel-based counterpart is a statistical model called a convolutional kernel network with parameters that can be learned from data. We provide an alternating minimization algorithm with mini-batch sampling and implicit partial differentiation to learn from data the parameters of each convolutional kernel network. We also show how to obtain inexact derivatives with respect to the parameters using an algorithm based on two inter-twined Newton iterations. The models and the algorithms are illustrated on benchmark datasets in image classification. We find that the convolutional neural networks and their kernel counterparts often perform similarly. Supplemental appendices and code for the article are available online.

Supplementary Materials

The supplementary materials are contained within a single zip archive, supplement.zip. This zip archive contains:

Technical appendices: Appendices containing mathematical descriptions of the ConvNets and CKNs used in the article, the derivation of the gradient of a CKN with respect to the weight matrices, additional details related to the training methods, and additional results. (appendix.pdf, PDF file)

Python code: Python code that can be used to reproduce the results in the article. (code.zip, zip archive)

Acknowledgments

The authors would like to thank the referees and the associate editor for their valuable comments.

Disclosure Statement

The authors report there are no competing interests to declare.

Notes

1 A dot product kernel is a kernel of the form k(x,y)=f(x,y) for a function f:R×RR. For notational convenience for a dot product kernel k we will write k(t) rather than k(x,y) where t=x,y. For a matrix A the element-wise application of k to ARm×n results in k(A):=[k(Ai,j)]i,j=1m,n.

Additional information

Funding

This work was mainly performed while C. Jones was at the University of Washington. The authors gratefully acknowledge funding from NSF CCF-1740551, NSF CCF-2019844, NSF DMS-2134012, CIFAR-LMB, and faculty research awards.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.