Full article: TSCAPE: time series clustering with curve analysis and projection on an Euclidean space

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The ever-growing use of digital systems has led to the accumulation of vast datasets, particularly time series, depicting the temporal evolution of variables and systems. Analysing these time series presents a tremendous challenge due to their inherent complexity and heterogeneity. Addressing an industrial need in the pharmaceutical wholesale sector, this paper introduces a new clustering method for time series: TSCAPE. The TSCAPE method uses a distance matrix calculated using dynamic time warping, followed by multidimensional scaling to project time series into a 2D Euclidean space, thus improving the last clustering stage by K-Means. Unlike conventional techniques, this approach, based on clustering of representation of distances in a Euclidean plane rather than on the curve shape, directly enhances the efficiency of the clustering process. The methodology exhibits significant potential for diverse applications, accommodating varied data types and irregular time series shapes. The research compares multiple variants and proposes metrics to assess their effectiveness on two open-access datasets. The results demonstrate the method's superiority over “only distance comparison clustering techniques”, like dynamic time warping and K-Means, with future prospects aimed at predictive applications and refining the clustering process by exploring alternative, more powerful clustering algorithms.

Keywords:

1. Introduction

Nowadays, the use of digital systems has become ubiquitous, resulting in the generation of vast datasets that can be stored and analysed to extract pertinent information. Within this pool of data lie time series, delineating the evolution of a system or variable over time, including trajectories, sales curves or temperature fluctuations. The exploration of such time series offers substantial insights beneficial for predictive analytics, behavioural studies and anomaly detection within the systems producing and using these data.

However, the analysis of time series data, given its complex and heterogeneous nature, is an intricate task, which complicates its analysis. For instance, companies may aim to identify anomalies in store performance, categorise products based on their sales behaviour, or detect failures in a production line. As a result, researchers are in constant pursuit of methodologies to cluster these time series, thereby simplifying their analysis. Similar endeavours were initiated, for instance to address the industrial requirements of pharmaceutical wholesale distributors. Specifically, in the pharmaceutical domain, predictions concerning coherent product groupings, indicating clusters of products displaying similar sales behaviours, lack adequate classifications beyond the pre-established Anatomical Therapeutic Chemical (ATC) classification by the World Health Organization for medications. This gap gave rise to the notion of clustering products with similar sales trends as a potential solution, aiming to generalise predictions across entire product groups by ensuring homogeneous sales behaviours across all members of a same group.

This paper introduces a new clustering method designed to enhance predictions without requiring comprehensive product usage context. The approach involves computing a distance matrix among products that have been grouped together, by using a dynamic time-warping algorithm to analyse time series. Subsequently, the representation of each time series is projected onto a 2D Euclidean space using a multidimensional scaling algorithm. The proximity of points on the plane indicates a similar time series, thereby facilitating the clustering process. To sum up the approach, we propose (1) to compute a time series distance matrix by using dynamic time warping, then (2) to transform the time series data into a set of points by applying multidimensional scaling on these pairwise distances and finally (3) to apply a K-Means clustering algorithm on this cloud of points.

Our primary goal is to cluster time series sharing the most comparable patterns and relying on an examination of curve shape similarities within the time series. Directly applying a clustering method to time series is intricate, considering the need to cluster data in dimensions equivalent to the size of the studied time series. Thus, our approach entails clustering based on a representation of the similarities between these time series, modelled by distances between points projected onto an Euclidean plane. With this approach, each entity within a cluster should exhibit the closest proximity to other members within the same cluster while maintaining a substantial distance from individuals in other clusters. This clustering strategy necessitates creating clusters centred around their centroids. The method's advantages lie in conducting clustering in a reduced-dimensional space, then grouping time series displaying low similarity distances, and in its intrinsic explicability.

One of the method's merits is its divergence from direct clustering based on curve shape; instead, clustering through a representation of distances on a Euclidean plane is performed. By clustering points in close proximity, assurance is gained regarding the similarity of the represented curves, thereby streamlining the clustering process. This innovative methodology enables one to efficiently manage a large volume of time series without a predetermined classification, holding significant potential for industrial and data science applications. Additionally, the adaptability of our method spans various data types, encompassing sensor data, financial data and meteorological data. It accommodates time series exhibiting diverse shapes, whether regular or irregular. Moreover, our approach provides a clear graphical depiction of time series groups, making it easier to understand results.

Following the review of the existing body of knowledge, the subsequent sections of the article will detail the developed clustering method, outlining each step and comparing multiple potential variations. The article will introduce the used datasets, will describe the data processing methods and will also present the metrics need to compare the different methods. Additionally, comparative elements will be suggested and the article will conclude by addressing future works and potential areas for enhancement.

2. Related works

2.1. Comparison techniques for time series

The paper addresses the analysis of time series curves' shapes to group them based on similarities. The literature presents various techniques to compare time series, exemplified in the work of Liao (Citation2005). The authors categorise three main clustering types for time series: “raw-data-based” in Golay et al. (Citation1998), using fuzzy logic clustering on functional magnetic resonance imaging data; “feature-based” in Shaw and King (Citation1992), employing cluster analysis to identify local oscillators; and “model-based” in Kalpakis et al. (Citation2001), comparing Linear Predictive Coding cepstrum of ARIMA time series. Techniques leveraging Euclidean distances (e.g. Elmore & Richman, Citation2001) involve creating a similarity matrix through Euclidean distance for Principal Component Analysis (PCA). However, the investigation into more efficient techniques, like the so-called Dynamic Time Warping (DTW) proposed by Sakoe and Chiba (Citation1978), remains less explored.

2.2. Distance comparison techniques in time series analysis

Dynamic Time Warping (DTW) is a well-known method for calculating the smallest distance between two time-series, allowing flexibility in aligning points to account for variations in speed and time. DTW has found major applications in time series clustering tasks and beyond.

DTW's efficiency in time series clustering is relevant in various studies, showcasing its adaptability and usefulness in pattern recognition, as illustrated in Petitjean et al. (Citation2011). DTW's application extended to avian studies where it identified patterns in bird songs, offering insights into bird communication and behaviour in the work of Jancovic et al. (Citation2013).

Additionally, Longest Common SubSequence (LCSS), as used in Vlachos et al. (Citation2002), operates by identifying the longest subsequence common to two timeseries, providing a robust measure of similarity. Edit distance with Real Penalty (ERP), see Chen and Ng (Citation2004), offers a technique computing distance between time series while accounting for their alignment and amplitude difference. Complexity-Invariant Distance (CID), proposed by Batista et al. (Citation2014), focuses on capturing the complexity pattern of time series, rendering it robust against length and scaling variations.

Furthermore, Shape-Based Measures like Slope Distance (cf. Kamalzadeh et al., Citation2020) and Procrustes Shape Analysis, like in the work of Andreella et al. (Citation2023), was used to align matrices. These techniques provide different methodologies to compare time series and are used in various fields, contributing significantly to pattern recognition in An et al. (Citation2023), signal processing in Liu et al. (Citation2023), data analysis Ejegwa and Agbetayo (Citation2023) and identification Hittmeir et al. (Citation2022).

2.3. Embedding and dimensionality reduction techniques

Embedding and dimensionality reduction techniques play a pivotal role in visualising high-dimensional data into lower dimensions while preserving crucial information.

MultiDimensional Scaling (MDS) is a fundamental technique used to transform distance matrices into point clouds on predefined Euclidean planes, as proposed in Cox and Cox (Citation2008). Its applications span various fields, showcasing its versatility and efficiency. For instance, MDS was applied in profiling analysis, unveiling distinct patterns in individuals' profiles, such as in Ding (Citation2000). Additionally, MDS was used for predicting bankruptcy, providing a succinct visualisation of companies' financial statuses, like in the work of Neophytou and Molinero (Citation2004). MDS was also used to analyse the financial health of banks, offering insights into the banking sector's dynamics, see, e.g. Mar-Molinero and Serrano-Cinca (Citation2001). Furthermore, MDS found application in virus behaviour analysis providing a structured understanding of virus behaviour patterns, cf. Lopes et al. (Citation2016). The technique often collaborates with clustering algorithms, as evident in the study analysing killer whale songs in Brown et al., Citation2006, employing MDS as part of an unsupervised grouping methodology leveraging a sequence of algorithms.

In addition to MDS, several other dimensionality reduction algorithms exist with distinct features and applications. t-SNE (t-Distributed Stochastic Neighbour Embedding) specialises in visualising high-dimensional data by minimising the divergence between data points in different dimensions, see Van der Maaten and Hinton (Citation2008). Isomap (Isometric Mapping, Tenenbaum et al. (Citation2000)) maintains geometric relationships in the high-dimensional space, preserving local properties in the low-dimensional representation. Locally Linear Embedding (LLE), as in Roweis and Saul (Citation2000), focuses on local linearity, projecting data points in a way that maintains the relationships between neighbouring points. UMAP (Uniform Manifold Approximation and Projection) like in the work of Becht et al. (Citation2019) is adept at preserving both local and global structures in the data space. PCA (Principal Component Analysis, used for instance in Jolliffe and Cadima (Citation2016)) and KernelPCA are widely used linear techniques that project data onto a new subspace to maximise variance, offering a simplified yet informative representation, and can be used before a clustering algorithm as in Mohammadi et al. (Citation2022).

These techniques provide diverse strategies for dimensionality reduction and are applicable in a myriad of domains, contributing significantly to data visualisation and analysis.

2.4. Clustering algorithms

Clustering encompasses various algorithms, each with distinct methodologies and applications in data analysis. K-means, as in Agarwal and Mustafa (Citation2004), is a pervasive technique relying on centroids, involving iterative assignment and replacement of cluster centroids. In contrast, HDBSCAN (Malzer & Baum, Citation2020) focuses on the similarity between cluster members, prioritising neighbourhood association over cluster centre proximity. This leads to the creation of more precise yet unequal-sized clusters, a feature particularly advantageous for non-uniformly distributed data. While HDBSCAN excels in handling clusters based on similarity, it may encounter challenges with non-compact point clouds and data points that do not easily adhere to traditional cluster formations.

Several other notable clustering algorithms deserve attention. DBSCAN (Density-Based Spatial Clustering of Applications with Noise, Ester et al., Citation1996) identifies clusters based on dense regions separated by areas of lower density. OPTICS (Ordering Points To Identify the Clustering Structure), proposed by Ankerst et al. (Citation1999), is an extension of DBSCAN, providing a hierarchical clustering representation. Mean Shift (Derpanis, Citation2005) uses a non-parametric technique to identify clusters by locating density maxima in the data space. This algorithm has recently been improved with the use of GPUs like in You et al. (Citation2022). Agglomerative Hierarchical Clustering (Day & Edelsbrunner, Citation1984) operates by successively merging individual data points or clusters based on their similarity, creating a dendrogram to illustrate the merging process.

Finally, BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies, Zhang et al., Citation1996) uses a hierarchical clustering approach to handle large datasets, producing a compact summary for subsequent analysis. Spectral Clustering, like in Lee and Park (Citation2023), Ng et al. (Citation2001) and Abbasimehr and Bahrini (Citation2022), operates on similarity graphs and spectral theory to partition data into clusters. Lastly, Affinity Propagation (Frey & Dueck, Citation2007) uses message passing to find the most appropriate examples of data points for clustering, balancing the quality of the example and the number of clusters.

These algorithms demonstrate diverse approaches and have applications in various domains, catering to different types of datasets and structures.

3. TSCAPE description

The idea of our solution is decomposed into 3 steps, see below (Figure ). In the first step, the distance is computed by pairwise comparison of time series curves to create a similarity matrix. Subsequently, this distance matrix is input to multidimensional scaling, which projects a representation of the time series onto a 2D space while preserving the distances described by the matrix. It is on this 2D representation that a clustering algorithm is applied to achieve the grouping of similar time series.

Figure 1. The different steps of our solution. The pairwise comparison of time series is performed to derive their similarity distances, subsequently transforming the resultant similarity matrix into a point projection representing the time series. This representation is then used in the formation of clusters comprising the closest points through the application of a clustering algorithm.

In the following subsections, we will detail each step of the methodology and the algorithms used in these steps.

3.1. Distance comparison with dynamic time warping (DTW)

Initiating the comparison process involves a pairwise assessment of all the time series representing sales curves. It is possible that certain sales curves exhibit similar behaviour but are slightly time-shifted. To address this issue, employing the dynamic time-warping algorithm enables the comparison of curves without being impeded by minor variations in their shapes. Dynamic Time Warping (DTW) is relevant in this context as it accommodates temporal shifts between sequences. It aligns the sequences by warping them in the time dimension, allowing for a flexible matching process, ideal to compare time series with inherent temporal distortions. The dynamic time warping calculates the smallest distance between two time series. Unlike a simple Euclidean distance calculation, it does not compare point to point the two time series and offers a degree of flexibility to align the points. This algorithm ( $DTW (x, y)$ ) is applied to the data (P) compared two by two, to create a matrix M of size $N \cdot N$ , N corresponding to the number of time series to be clustered. M is defined as follows: $M (i, j) = DTW (i, j)$ with i,j belonging to P. The $DTW (X, Y)$ algorithm is defined as follows. For two time series X, Y of respective size $N 1$ and $N 2$ , the first step is to create a distance matrix D of size $N 1 \cdot N 2$ with: $D (k, p) = | Xk - Yp | + \min {D (k - 1, p), D (k, p - 1), D (k - 1, p - 1)} .$ The next step is to recreate the path between $D (N 1 - 1, N 2 - 1)$ and $D (0, 0)$ with the smallest distance value. For this purpose, one adds the value of the cells that were crossed, using as a rule of displacement : $For current cell = D (k, p)$ $Next cell = \min {D (k - 1, p), D (k, p - 1), D (k - 1, p - 1)} .$

Once the matrix M was created and filled, it was passed as input to the multidimensional scaling algorithm.

Tests have also been performed by replacing the classical dynamic time warping algorithm by a soft_DTW as in Cuturi and Blondel (Citation2017), which is a variant of the DTW algorithm with an additional “gamma” parameter (Table ).

Table 1. Variation of the metrics according to the SoftDTW gamma parameters.

Download CSV Display Table

3.2. Embedding: multidimensional scaling (MDS)

MDS is an algorithm used in data science to visualise dissimilarities between objects in a dataset by mapping them to a lower-dimensional space while preserving as much as possible their original pairwise distances. The primary goal of MDS is to represent high-dimensional data in a reduced number of dimensions, typically two or three, to facilitate visualisation and interpretation.

3.2.1. Mathematical formulation

Consider a dataset of n objects, where the dissimilarities between each pair of objects are represented by a symmetric n×n dissimilarity matrix D. MDS aims to embed these objects into a lower-dimensional space, typically p dimensions.

The objective of MDS is to find a configuration of points X=x₁,x₂,…,x_n in a p-dimensional space such that the Euclidean distances between these points best approximate the dissimilarities in the original matrix D.

The stress function in classical MDS, also known as Kruskal's stress, is designed to minimise the difference between the original dissimilarities D_ij and the reconstructed dissimilarities using the embedded points x_i and x_j in the reduced space.

The stress function is given by: $E (X) = \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} w_{ij} (d_{ij} - ‖ x_{i} - x_{j} ‖)^{2}$ where:

E(X) is the stress function to be minimised,
w_ij is a weight associated with the dissimilarity between objects i and j,
d_ij is the original dissimilarity between objects i and j,
∥x_i−x_j∥ represents the Euclidean distance between points x_i and x_j in the lower-dimensional space.

The minimisation problem is solved using optimisation techniques such as gradient descent, eigendecomposition or nonlinear optimisation methods.

3.2.2. Operational details

Input: The input to MDS is a dissimilarity matrix representing the pairwise dissimilarities between objects.
Computing Distances: Calculate the pairwise Euclidean distances from the dissimilarity matrix.
Optimisation: Minimise the stress function by finding the optimal configuration of points in the lower-dimensional space.
Visualisation: Visualise the embedded points in 2D or 3D space to explore the relationships between objects.

Multidimensional Scaling serves several essential purposes in the preprocessing of time series data for clustering analysis. Primarily, MDS endeavours to maintain the original relationships between time series by conserving pairwise distances or dissimilarities as closely as possible during dimensionality reduction. This preservation of proximity ensures the retention of critical information crucial for subsequent clustering. Additionally, MDS helps to reduce noise by filtering out extraneous details in the time series data, highlighting the more substantial patterns and structures, thereby enhancing the efficiency of subsequent clustering algorithms. Moreover, by representing time series data in a reduced-dimensional space, MDS allows for a more effective application of clustering algorithms, generating more insightful and accurate clusters as it reflects the original relationships between the data points. Finally, MDS contributes to computational efficiency by reducing the complexity of subsequent clustering algorithms through dimensionality reduction, making the clustering process more computationally manageable and facilitating the application of algorithms struggling with higher dimensions.

3.2.3. Use in TSCAPE

The second step of TSCAPE is to apply a multidimensional scaling algorithm on the distance matrix generated by the DTW. This algorithm allows, by using a distance matrix, to create points in an Euclidean space of dimension d. Each point represents a time series contained in the matrix M. If $d \leq 3$ , visualisation of the positions of the points is possible graphically. But this is not a limiting bound. The last step is to perform a clustering on these points.

3.3. Clustering with K-means

For the clustering step, a few algorithms have been applied on the points representing the time series.

The K-Means algorithm is a widely used clustering method in data analysis to partition a dataset into K distinct clusters. It operates by minimising the sum of squared distances between data points and their assigned cluster centres, thus forming K clusters around centres called centroids.

3.3.1. Mathematical formulas

Let X={x₁,x₂,…,x_n} be a set of n data points and K the number of clusters to be formed.

Initialisation of Centroids: Randomly select K points as initial centroids C=c₁,c₂,…,c_K.
Assignment of Points to Clusters: For each point x_i, calculate the distance to each centroid c_j and assign x_i to the cluster with the nearest centroid. This can be formulated as: $Cluster (x_{i}) = \underset{j}{\arg min} | | x_{i} - c_{j} | |^{2}$
Updating the Centroids: Recalculate the centroids of each cluster as the mean of all points assigned to that cluster: $c_{j} = \frac{1}{| S_{j} |} \sum_{x_{i} \in S_{j}} x_{i}$ where S_j is the set of points assigned to cluster j.
Repeat steps 2 and 3 until convergence, i.e. until there is no change in the allocation of points to clusters or until a predefined tolerance is reached.

3.3.2. Use in TSCAPE

The clustering is therefore not applied on the curve shapes but on a representation on a Euclidean plane of their dissimilarities. Applying K-means on the MDS-transformed data helps to form distinct clusters by leveraging the reduced, structured representations provided by MDS. The combination allows K-means to operate on the reduced dimensions while aiming to form clusters based on the derived spatial relationships among the time series data points. This integration enables K-means to capture meaningful patterns and similarities in the reduced space created by MDS, potentially leading to more insightful and accurate clustering results for time series data.

The primary advantage of the K-means algorithm compared to other clustering methods, particularly when aiming for highly concentrated clusters, lies in its ability to minimise intra-cluster variance. K-means seeks to minimise the sum of squared distances of points to their respective centroids, efficiently grouping points around centres (centroids) to minimise dispersion within each cluster. In comparison to other clustering algorithms, K-means is effective in forming compact clusters as it explicitly targets the reduction of intra-cluster variance using an iterative approach of reassigning points to clusters based on their proximity to centroids, thus promoting the concentration of data within each cluster. This is particularly relevant for our work as we aim to create clusters with the smallest intra-cluster distances, which would correspond to groups of products with the most similar curves.

4. Experiments

4.1. Data used

The data used for the initial project being proprietary, for this paper we used open-source datasets available on

The first one is an open dataset Szrlee (Citation2018) corresponding to data from company exchanges, between 2006 and 2018. For this paper, the column “Volume” (corresponding to the number of shares traded) was used.
The second dataset corresponds to data of house sales prices according to their district and their number of rooms: Holdings (Citation2019). The data has been grouped by district number, property type and number of bedrooms: for example, “2603_5_house” will contain all the 5-bedroom houses in district 2603.

Each dataset is transformed to obtain three columns: date, product name and quantitative values simulating the sales quantity. The data are aggregated by month and by product. This gives us for each product a time series of monthly sales. The missing values over the period are replaced by zero. Before being inputted in Dynamic Time Warning algorithms, the data are also normalised between 0 and 1, to avoid that the volume of sales influences the distance scale of the dissimilarity matrix.

4.2. Metrics

To compare the solutions between them, three metrics were used.

The silhouette score (SilScore in this paper) checks the density of clusters and their separations. The solution interval for the silhouette score is between 1 (best solution) and −1 (worst one).
The Calinski-Harabasz score (CHScore in this paper) is somewhat similar: it is a ratio between the sum of inter-cluster variances and the sum of intra-cluster ones. It ranges from 0 (worst) to +infinity (best), and its range is related to the data set.
The Davies-Bouldin index (DbScore in this paper) measures how similar it is to the closest cluster. It is the average of the maximum ratio between the distance of a point to the centre of its cluster and the distance between two cluster centres. It is between 0 (best) and + infinity (worst).

4.3. Clustering by K-means with DTW metric (tslearn)

In the first step, the data were clusterized with the TimeSeriesKmeans function of the TSlearn Tavenard et al. (Citation2020) Python library. This library is specialised on the treatment of time series. The TimeSeriesKmeans function has an option in its metric parameter that directly uses dynamic time warning. The number of clusters has been chosen by an Elbow method (Figures and ). The elbow method is a technique used in clustering algorithms, such as K-means, to determine the optimal number of clusters in a dataset. It involves plotting the variance or distortion (inertia) against the number of clusters. As the number of clusters increases, the variance typically decreases. The “elbow” point on the graph signifies the number of clusters where the rate of decline changes abruptly, indicating the point of diminishing returns, helping to select an appropriate number of clusters for the data. The number of clusters was set to 8 for the house datasets and to 10 for the control Market dataset. The parameters of the K-means are the same as those described in Section “4.4.3 K-Means configuration” of this paper, except for the specific parameter of this solution: metric, which was initialised to dtw.

Figure 2. Elbow method for Market dataset.

Figure 3. Elbow method for House dataset.

The results show that this first solution is not very efficient for our data. The silhouette score is low, which indicates poor cluster coherence (poor cluster separation, or too large distances between individuals in a cluster). Indeed, the sales curves are quite different from each other and therefore the pattern detection does not seem to work very well (Tables and ).

4.4. Principal component analysis (PCA) for time series clustering

Principal Component Analysis (PCA) is a crucial dimensionality reduction technique widely used in time series clustering. In this context, PCA enhances the efficiency of subsequent clustering algorithms, such as K-means. The process involves mean-centering the time series data and extracting principal components that capture the maximum variance.

Following mean-centering, the covariance matrix is computed, and PCA is applied to obtain principal components ordered by variance. The decision on the number of components to retain is crucial, often based on a predefined variance threshold. Subsequently, time series are reconstructed using the selected principal components, resulting in a reduced-dimensional representation.

The reduced representation serves as input for K-means clustering, where time series are grouped into clusters based on their similarity in the reduced feature space. PCA offers advantages such as noise reduction, computational efficiency and enhanced clustering performance.

Using PCA in conjunction with a clustering algorithm such as K-means appears to be one of the most effective methods for time series clustering, as shown in the works of Li (Citation2019), Yang and Shahabi (Citation2004) and Singhal and Seborg (Citation2005). These approaches differ from our work as they do not rely on the comparison of distances between time series. Despite achieving superior results ( Tables and ), it comes at the cost of the interpretability of dimensionality reduction: in our method, two points are necessarily close because they have a short distance calculated by the DTW, i.e. their curves are similar, as opposed to the PCA method, which reduces dimensions by creating its own axes, making projection analysis more complex.

We also conducted tests by (PCA) into the embedding stage of our methodology, to assess whether it enhances the outcomes ( Tables and ).

4.5. TSCAPE experimentation

To compare both solutions the parameters used are preserved, as well as the number of clusters and the Kmean setting (excluding “metric = dtw”).Footnote¹

Dynamic Time Warping configuration
The “gamma” parameter of the soft-DTW was tested with different values. The scores of the different values were compared to each other, and also with the results obtained on a classic dynamic time warping (Table ).
Multidimensional scaling configuration
For the multidimensional scaling step, the variation of the dimension parameter (n_components) was tested. An example for both datasets, on a space of dimension d = 2, is provided in Figures and .
Increasing the number of dimensions allows more degrees of freedom to the Multidimensional Scaling algorithm. But looking at Table , one can see that increasing the dimension $d \leq 4$ does not improve the different scores. The MSD algorithm has a random_state parameter that allows us to initialise it with different seed values. The variation of this parameter does not seem to change greatly the result (Table ). The observation of these results shows that changing the initialisation seed of the MDS varies only slightly with the clustering result for datasets with many samples. In case of datasets with a few samples, this parameter can be useful to increase the silhouette score.
Tests have been performed with other dimensionality reduction algorithms. The t-SNE algorithm is one of the best methods in this category. It was used just after the MDS to see if it was able to place the points corresponding to the products more efficiently than the MDS. The results (Tables and ) show a slight improvement in silhouette score for datasets with many samples. It seems that when the number of samples and the size of the matrix M is larger, the MDS has fewer degrees of freedom to place the points between them, and therefore offers more convergent results whatever the seed. It can then be assisted by the t-SNE to improve these results (Figures and ).
Tests were also done by replacing the MDS algorithm with a Spectral Embedding such as in Belkin and Niyogi (Citation2003). Although the results were better on some configurations, this did not allow the best solution to be surpassed (Tables and ).
K-means configuration
The parameters of the K-means are the same as for the tslearn part: max_iter was set to 400, n_init to 100, init to K-means++, and finally random_state was initialised and varied from 0 to 20. The number of clusters is also the same as defined in the tslearn part of this paper: (8 for the house datasets and 10 for the control Market dataset). All the points are indeed in a cluster, but sometimes some points close to each other are in 2 different clusters. This is due to the K-means clustering method, it is the only problematic consequence of this choice. This problem can be corrected by increasing the number of clusters enough to try to better distribute the points in each group, at the expense of the size of the number of individuals per cluster, and of the ease of analysis (Figures and ).
Experiments were conducted using K-Medoids and Spectral clustering algorithms. Replacing the K-means algorithm in our solution with K-Medoids led to some improvements in certain metrics for specific datasets. However, when used solely after DTW, our solution remained the most effective. On the other hand, the Spectral Clustering algorithm did not improve our solution (Tables and ).

4.6. Results

In our methodology, the first two steps involving dynamic time warping and multidimensional scaling represent the similarity between two time-series by establishing a distance between two points. Our goal is to group time series that have the most similar curves, necessitating the grouping of the closest points. Each individual within a cluster should be as close as possible to the other members of the same cluster and ideally far from the members of other clusters. This implies the creation of clusters around their centroids, a task efficiently performed by the K-means algorithm. Metrics such as the silhouette score, Calinski-Harabasz score and Davies-Bouldin index are relevant to evaluate data clustering when seeking to group the closest points. The silhouette score measures the similarity of points within clusters and their dissimilarity from other clusters. The Calinski-Harabasz score evaluates cluster compactness and separation, while the Davies-Bouldin index quantifies both intra-cluster dispersion and inter-cluster separation. These metrics offer a comprehensive evaluation of clusters in terms of intra-cluster cohesion, inter-cluster separation and differentiation between clusters.

Figure 4. 2D projection with MDS for the Market dataset. Each point corresponds to the representation of a time series. The distance between two points represents the similarity score. Two points close to each other means that both time series have a similar curve.

Figure 5. 2D projection with MDS for the House dataset. Each point corresponds to the representation of a time series. The distance between two points represents the similarity score. Two points close to each other means that both time series have a similar curve.

Figure 6. 2D projection with MDS followed by point replacement using the t-SNE algorithm for the Market dataset. The distance between two points represents the similarity score. Each point corresponds to the representation of a time series. Two points close to each other mean that the both time series have a similar curve.

Figure 7. 2D projection with MDS followed by point replacement using the t-SNE algorithm for the House dataset. The distance between two points represents the similarity score. Each point corresponds to the representation of a time series. Two points close to each other mean that both time series have a similar curve.

Figure 8. Clustering for the Market dataset. Clusters are represented by different colours: two points of the same colour are in the same cluster.

Figure 9. Clustering for the House dataset. Different colours represent clusters: two points of the same colour are in the same cluster.

Table 2. Variation of the metrics according to the number of dimensions of the MDS.

Download CSV Display Table

Table 3. Variation of the metrics according to the seed number of the MDS.

Download CSV Display Table

Table 4. Benchmark (means of 10 runs) of all solutions and options of this paper for House dataset (the best solution is in bold).

Download CSV Display Table

Table 5. Benchmark (means of 10 runs) of all solutions and options of this paper for Market dataset (the best solution is in bold).

Download CSV Display Table

The silhouette score of the clusters proves the improvement our solution TSCAPE brought (Tables and ). The addition of multidimensional scaling improves the silhouette score compared to a solution composed only of dynamic time warping and K-means (with equal parameter settings). Adding a t-SNE algorithm between MDS and clustering improves the silhouette score in some cases. Replacing the DTW algorithm by its Soft-DTW enables one to improve the results with certain “gamma” parameter values.

5. Conclusion

Through this paper, we have presented a technique to cluster time series. This clustering is based on the analysis and comparison of curve shapes. The different techniques could be tested by comparing the silhouette scores, the Calinski-Harabasz score and the Davies-Bouldin index, of all the clusters. The results showed that the solution proposed in this paper is more efficient in creating clusters than the existing methods based on dynamic time warping and K-Means. Although our method is less effective than the PCA-based methods, it provides better interpretation in terms of dimension reduction. The next step of this work is to use these clusters to make predictions by grouping products in the context of data from a wholesaler-retailer for French pharmacies. Clustering improvement works can also be done, for example by testing other clustering algorithms more powerful than the K-means one.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Source code available at github.com : https://github.com/JRenaud41/TSCAPE

References

Abbasimehr, H., & Bahrini, A. (2022). An analytical framework based on the recency, frequency, and monetary model and time series clustering techniques for dynamic segmentation. Expert Systems with Applications, 192, 116373. https://doi.org/10.1016/j.eswa.2021.116373
Web of Science ®Google Scholar
Agarwal, P. K., & Mustafa, N. H. (2004). K-means projective clustering. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS '04 (pp. 155–165). Association for Computing Machinery.
Google Scholar
An, S., Bhat, G., Gumussoy, S., & Ogras, U. (2023). Transfer learning for human activity recognition using representational analysis of neural networks. ACM Transactions on Computing for Healthcare, 4(1), 1–21. https://doi.org/10.1145/3563948
PubMedGoogle Scholar
Andreella, A., De Santis, R., Vesely, A., & Finos, L.. (2023). Procrustes-based distances for exploring between-matrices similarity. Statistical Methods & Applications, 32, 867–882. https://doi.org/10.1007/s10260-023-00689-y
Web of Science ®Google Scholar
Ankerst, M., Breunig, M. M., Kriegel, H.-P., & Sander, J. (1999). Optics: Ordering points to identify the clustering structure. ACM Sigmod Record, 28(2), 49–60. https://doi.org/10.1145/304181.304187
Google Scholar
Batista, G. E., Keogh, E. J., Tataw, O. M., & De Souza, V. M. (2014). CID: An efficient complexity-invariant distance for time series. Data Mining and Knowledge Discovery, 28(3), 634–669. https://doi.org/10.1007/s10618-013-0312-3
Web of Science ®Google Scholar
Becht, E., McInnes, L., Healy, J., Dutertre, C.-A., Kwok, I. W., Ng, L. G., Ginhoux, F., & Newell, E. W. (2019). Dimensionality reduction for visualizing single-cell data using UMAP. Nature Biotechnology, 37(1), 38–44. https://doi.org/10.1038/nbt.4314
Web of Science ®Google Scholar
Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6), 1373–1396. https://doi.org/10.1162/089976603321780317
Web of Science ®Google Scholar
Brown, J. C., Hodgins-Davis, A., & Miller, P. J. O. (2006). Classification of vocalizations of killer whales using dynamic time warping. The Journal of the Acoustical Society of America, 119(3), EL34–EL40. https://doi.org/10.1121/1.2166949
PubMed Web of Science ®Google Scholar
Chen, L., & Ng, R. (2004). On the marriage of Lp-norms and edit distance. In Proceedings of the Thirtieth international conference on Very large data bases (Vol. 30, pp. 792–803).
Google Scholar
Cox, M. A. A., & Cox, T. F. (2008). Multidimensional scaling (pp. 315–347). Springer.
Google Scholar
Cuturi, M., & Blondel, M. (2017). Soft-DTW: A differentiable loss function for time-series. In Precup, D. and Teh, Y. W., (Eds), Proceedings of the 34th international conference on machine learning, volume 70 of proceedings of machine learning research (pp. 894–903). PMLR.
Google Scholar
Day, W. H., & Edelsbrunner, H. (1984). Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification, 1(1), 7–24. https://doi.org/10.1007/BF01890115
Web of Science ®Google Scholar
Derpanis, K. G. (2005). Mean shift clustering. Lecture Notes, 32, 1–4. http://www.cse.yorku.ca/~kosta/CompVis_Notes/mean_shift.pdf
Google Scholar
Ding, C. S. (2000). Profile analysis: Multidimensional scaling approach. Practical Assessment, Research, and Evaluation, 7(1), 16.
Google Scholar
Ejegwa, P. A., & Agbetayo, J. M. (2023). Similarity-distance decision-making technique and its applications via intuitionistic fuzzy pairs. Journal of Computational and Cognitive Engineering, 2(1), 68–74.
Google Scholar
Elmore, K. L., & Richman, M. B. (2001). Euclidean distance as a similarity metric for principal component analysis. Monthly Weather Review, 129(3), 540–549. https://doi.org/10.1175/1520-0493(2001)129<0540:EDAASM>2.0.CO;2
Web of Science ®Google Scholar
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996, August). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (Vol. 96, pp. 226–231).
Google Scholar
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976. https://doi.org/10.1126/science.1136800
PubMed Web of Science ®Google Scholar
Golay, X., Kollias, S., Stoll, G., Meier, D., Valavanis, A., & Boesiger, P. (1998). A new correlation-based fuzzy logic clustering algorithm for fMRI. Magnetic Resonance in Medicine, 40(2), 249–260. https://doi.org/10.1002/mrm.v40:2
PubMed Web of Science ®Google Scholar
Hittmeir, M., Mayer, R., & Ekelhart, A. (2022). Distance-based techniques for personal microbiome identification. In Proceedings of the 17th international conference on availability, reliability and security (pp. 1–13).
Google Scholar
Holdings, H. (2019). House property sales time series. Data retrieved from Kaggle. https://www.kaggle.com/datasets/htagholdings/property-sales?resource=download.
Google Scholar
Jancovic, P., Köküer, M., Zakeri, M., & Russell, M. (2013). Unsupervised discovery of acoustic patterns in bird vocalisations employing dtw and clustering. In 21st European signal processing conference (EUSIPCO 2013) (pp. 1–5).
Google Scholar
Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202
PubMed Web of Science ®Google Scholar
Kalpakis, K., Gada, D., & Puttagunta, V. (2001). Distance measures for effective clustering of ARIMA time-series. In Proceedings 2001 IEEE international conference on data mining (pp. 273–280). IEEE.
Google Scholar
Kamalzadeh, H., Ahmadi, A., & Mansour, S. (2020). Clustering time-series by a novel slope-based similarity measure considering particle swarm optimization. Applied Soft Computing, 96, 106701. https://doi.org/10.1016/j.asoc.2020.106701
Web of Science ®Google Scholar
Lee, Y., & Park, S. (2023). Spectral clustering of weighted variables on multi-omics data. The Korean Journal of Applied Statistics, 36(3), 175–196. https://doi.org/10.5351/KJAS.2023.36.3.175
Web of Science ®Google Scholar
Li, H. (2019). Multivariate time series clustering based on common principal component analysis. Neurocomputing, 349, 239–247. https://doi.org/10.1016/j.neucom.2019.03.060
Web of Science ®Google Scholar
Liao, T. W. (2005). Clustering of time series data–a survey. Pattern Recognition, 38(11), 1857–1874. https://doi.org/10.1016/j.patcog.2005.01.025
Web of Science ®Google Scholar
Liu, B., Liu, C., Zhou, Y., Wang, D., & Dun, Y. (2023). An unsupervised chatter detection method based on AE and merging GMM and k-means. Mechanical Systems and Signal Processing, 186, 109861. https://doi.org/10.1016/j.ymssp.2022.109861
Web of Science ®Google Scholar
Lopes, A. M., Andrade, J. P., & Tenreiro Machado, J. (2016). Multidimensional scaling analysis of virus diseases. Computer Methods and Programs in Biomedicine, 131, 97–110. https://doi.org/10.1016/j.cmpb.2016.03.029
PubMed Web of Science ®Google Scholar
Malzer, C., & Baum, M. (2020). A hybrid approach to hierarchical density-based cluster selection. In 2020 IEEE international conference on multisensor fusion and integration for intelligent systems (MFI) (pp. 223–228).
Google Scholar
Mar-Molinero, C., & Serrano-Cinca, C. (2001). Bank failure: A multidimensional scaling approach. The European Journal of Finance, 7(2), 165–183. https://doi.org/10.1080/13518470122202
Google Scholar
Mohammadi, Y., Miraftabzadeh, S. M., Bollen, M. H., & Longo, M. (2022). An unsupervised learning schema for seeking patterns in RMS voltage variations at the sub-10-minute time scale. Sustainable Energy, Grids and Networks, 31, 100773. https://doi.org/10.1016/j.segan.2022.100773
Web of Science ®Google Scholar
Neophytou, E., & Molinero, C. M. (2004). Predicting corporate failure in the UK: A multidimensional scaling approach. Journal of Business Finance & Accounting, 31(5-6), 677–710. https://doi.org/10.1111/jbfa.2004.31.issue-5-6
Google Scholar
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 14.
Google Scholar
Petitjean, F., Ketterlin, A., & Gançarski, P. (2011). A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition, 44(3), 678–693. https://doi.org/10.1016/j.patcog.2010.09.013
Web of Science ®Google Scholar
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326. https://doi.org/10.1126/science.290.5500.2323
PubMed Web of Science ®Google Scholar
Sakoe, H., & Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49. https://doi.org/10.1109/TASSP.1978.1163055
Google Scholar
Shaw, C., & King, G. (1992). Using cluster analysis to classify time series. Physica D: Nonlinear Phenomena, 58(1-4), 288–298. https://doi.org/10.1016/0167-2789(92)90117-6
Google Scholar
Singhal, A., & Seborg, D. E. (2005). Clustering multivariate time-series data. Journal of Chemometrics: A Journal of the Chemometrics Society, 19(8), 427–438. https://doi.org/10.1002/cem.v19:8
Web of Science ®Google Scholar
Szrlee. (2018). DJIA 30 stock time series. Data retrieved from Kaggle. https://www.kaggle.com/datasets/szrlee/stock-time-series-20050101-to-20171231?resource=download.
Google Scholar
Tavenard, R., Faouzi, J., Vandewiele, G., Divo, F., Androz, G., Holtz, C., Payne, M., Yurchak, R., Rußwurm, M., Kolar, K., & Woods, E. (2020). Tslearn, a machine learning toolkit for time series data. Journal of Machine Learning Research, 21(118), 1–6.
PubMedGoogle Scholar
Tenenbaum, J. B., Silva, V. d., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323. https://doi.org/10.1126/science.290.5500.2319
PubMed Web of Science ®Google Scholar
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(11), 0–0.
Google Scholar
Vlachos, M., Kollios, G., & Gunopulos, D. (2002). Discovering similar multidimensional trajectories. In Proceedings 18th international conference on data engineering (pp. 673–684). IEEE.
Google Scholar
Yang, K., & Shahabi, C. (2004). A PCA-based similarity measure for multivariate time series. In Proceedings of the 2nd ACM international workshop on multimedia databases (pp. 65–74).
Google Scholar
You, L., Jiang, H., Hu, J., Chang, C. H., Chen, L., Cui, X., & Zhao, M. (2022). GPU-accelerated faster mean shift with euclidean distance metrics. In 2022 IEEE 46th annual computers, software, and applications conference (COMPSAC) (pp. 211–216). IEEE.
Google Scholar
Zhang, T., Ramakrishnan, R., & Livny, M. (1996). Birch: An efficient data clustering method for very large databases. ACM Sigmod Record, 25(2), 103–114. https://doi.org/10.1145/235968.233324
Google Scholar

TSCAPE: time series clustering with curve analysis and projection on an Euclidean space

Abstract

1. Introduction

2. Related works

2.1. Comparison techniques for time series

2.2. Distance comparison techniques in time series analysis