967
Views
0
CrossRef citations to date
0
Altmetric
Data Visualization

Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 1572-1587 | Received 22 Nov 2022, Accepted 19 Mar 2023, Published online: 21 Apr 2023
 

Abstract

Parallel Coordinate Plots (PCP) are a valuable tool for exploratory data analysis of high-dimensional numerical data. The use of PCPs is limited when working with categorical variables or a mix of categorical and continuous variables. In this article, we propose Generalized Parallel Coordinate Plots (GPCP) to extend the ability of PCPs from just numeric variables to dealing seamlessly with a mix of categorical and numeric variables in a single plot. In this process we find that existing solutions for categorical values only, such as hammock plots or parsets become edge cases in the new framework. By focusing on individual observations rather than a marginal frequency we gain additional flexibility. The resulting approach is implemented in the R package ggpcp. Supplementary materials for this article are available online.

Supplementary Materials

Code and data to produce this paper are available at https://github.com/srvanderplas/ggpcp-paper. The ggpcp package code is available on CRAN and the development version can be found at https://github.com/heike/ggpcp.

Acknowledgments

We would like to thank all the contributors to open software, in particular, the authors behind the ‘tidyverse‘packages (Wickham et al. Citation2019). We acknowledge that a lot of this work is only possible with the help of unpaid volunteers and developers.

Data Availability Statement

This manuscript has been created in the reproducible Rweave format (Xie Citation2014, Citation2015) in R (R Core Team Citation2022) using the RStudio IDE (version Elsbeth Geranium).