Abstract
Each generation of CPU provides more resources and new features. These increase the ability to perform algorithms faster and with a higher degree of parallelism. The article discusses methods used to optimise CRC generation algorithms for long data blocks with consideration of the capabilities of contemporary systems. We analysed known software CRC algorithms and combined all known principles into a solution scalable in multiple CPU cores on single and multi-socket systems. Various algorithms were evaluated on contemporary multicore systems with 1 × 4, 1 × 64, 2 × 12, and 4 × 26 cores. The results show how the performance is affected by the architecture of the memory subsystem. Compared to the original sequential Sarwate algorithm, our algorithms are 48.0, 51.1, 38.0, and 28.8 times faster.
Acknowledgements
The authors thank the local subsidiary of Hewlett Packard Enterprise for allowing us to run our tests on their demo system HP ProLiant DL560 G10.
Disclosure statement
No potential conflict of interest was reported by the author(s).