Abstract
Convolutional neural networks, as most artificial neural networks, are frequently viewed as methods different in essence from kernel-based methods. In this work we translate several classical convolutional neural networks into kernel-based counterparts. Each kernel-based counterpart is a statistical model called a convolutional kernel network with parameters that can be learned from data. We provide an alternating minimization algorithm with mini-batch sampling and implicit partial differentiation to learn from data the parameters of each convolutional kernel network. We also show how to obtain inexact derivatives with respect to the parameters using an algorithm based on two inter-twined Newton iterations. The models and the algorithms are illustrated on benchmark datasets in image classification. We find that the convolutional neural networks and their kernel counterparts often perform similarly. Supplemental appendices and code for the article are available online.
Supplementary Materials
The supplementary materials are contained within a single zip archive, supplement.zip. This zip archive contains:
Technical appendices: Appendices containing mathematical descriptions of the ConvNets and CKNs used in the article, the derivation of the gradient of a CKN with respect to the weight matrices, additional details related to the training methods, and additional results. (appendix.pdf, PDF file)
Python code: Python code that can be used to reproduce the results in the article. (code.zip, zip archive)
Acknowledgments
The authors would like to thank the referees and the associate editor for their valuable comments.
Disclosure Statement
The authors report there are no competing interests to declare.
Notes
1 A dot product kernel is a kernel of the form for a function . For notational convenience for a dot product kernel k we will write rather than where . For a matrix A the element-wise application of k to results in .