Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Filipe Oliveiraa CIICESI, ESTG, Politécnico do Porto, PortugalView further author information

Davide Carneiroa CIICESI, ESTG, Politécnico do Porto, PortugalCorrespondence[email protected]
View further author information

Miguel Guimarãesa CIICESI, ESTG, Politécnico do Porto, PortugalView further author information

Óscar Oliveiraa CIICESI, ESTG, Politécnico do Porto, PortugalView further author information

Paulo Novaisb ALGORITMI Research Centre/LASI, University of Minho, PortugalView further author information

Abstract

As distributed and multi-organization Machine Learning emerges, new challenges must be solved, such as diverse and low-quality data or real-time delivery. In this paper, we use a distributed learning environment to analyze the relationship between block size, parallelism, and predictor quality. Specifically, the goal is to find the optimum block size and the best heuristic to create distributed Ensembles. We evaluated three different heuristics and five block sizes on four publicly available datasets. Results show that using fewer but better base models matches or outperforms a standard Random Forest, and that 32 MB is the best block size.

KEYWORDS:

Disclosure statement

No potential conflict of interest was reported by the author(s).

Additional information

Funding

This work was supported by FCT – Fundação para a Ciência e Tecnologia within projects [grant number UIDB/04728/2020], [grant number EXPL/CCI-COM/0706/2021] and [grant number CPCA-IAC/AV/475278/2022.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Information for

Open access

Opportunities

Help and information

Block size, parallelism and predictive performance: finding the sweet spot in distributed learning

Abstract

Disclosure statement

Additional information

Funding

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature