Abstract
We determine the accuracy with which machine learning and deep learning techniques can classify selected World War II era ciphers when only ciphertext is available. The specific ciphers considered are Enigma, M-209, Sigaba, Purple, and Typex. We experiment with three classic machine learning models, namely, Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Random Forest (RF). We also experiment with four deep learning models: Multi-Layer Perceptrons (MLP), Long Short-Term Memory (LSTM), Extreme Learning Machines (ELM), and Convolutional Neural Networks (CNN). Each model is trained on features consisting of histograms, digrams, and raw ciphertext letter sequences. Furthermore, the classification problem is considered under four distinct scenarios: Fixed plaintext with fixed keys, random plaintext with fixed keys, fixed plaintext with random keys, and random plaintext with random keys. Under the most realistic scenario, given 1,000 characters per ciphertext, we are able to distinguish the ciphers with more than 97% accuracy. In addition, we consider the accuracy of a subset of the learning techniques as a function of the ciphertext length. We find that classic learning models outperform the deep learning models that we tested, and ciphers that are more similar in design are somewhat more challenging to distinguish.
Acknowledgment
The authors sincerely thank Nils Kopal for his help in generating the data that was essential for the success of this project.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Additional information
Notes on contributors
Brooke Dalton
Brooke Dalton received her Masters in Computer Science from San Jose State University in 2022. Her research interests include machine learning and deep learning. Brooke is currently employed in the tech sector.
Mark Stamp
Mark Stamp can neither confirm nor deny that in the previous century, he spent more than seven years working as a cryptologic mathematician at the National Security Agency. However, he can confirm that in this century, he worked at a small Silicon Valley startup, where he helped to develop a security-related product, and that for the past two decades, he has been employed as a Professor of Computer Science at San Jose State University (SJSU). At SJSU, Mark has developed and teaches a popular course on information security and, more recently, he developed a course in machine learning, which he also teaches regularly. When not teaching, supervising student research projects, or writing textbooks, Mark can usually be found fishing or sailing his kayak in the Monterey Bay. Mark's two most recent textbooks are Information Security: Principles and Practice, 3rd edition (Wiley 2021) and Introduction to Machine Learning with Applications in Information Security, 2nd edition (Chapman and Hall/CRC 2022).