3,543
Views
5
CrossRef citations to date
0
Altmetric
Research Article

A two-stage business analytics approach to perform behavioural and geographic customer segmentation using e-commerce delivery data

ORCID Icon, , , &
Pages 1-29 | Received 06 Apr 2022, Accepted 18 Nov 2022, Published online: 08 Dec 2022

ABSTRACT

Customer segmentation is considered the cornerstone for personalisation, target advertising, and promotion assisting both researchers and practitioners to enhance customers’ buying behaviour understanding. Pertinent literature mainly exploits one distinct segmentation type such as behavioural to segment customers solely under one lens. We develop a two-stage business analytics approach that introduces a combination of geographic and behavioural customer segmentation. Our approach is based on data mining and machine learning techniques. We evaluate the suggested approach using e-commerce home delivery data. First, we segment customers based on the products ordered to identify behavioural customer segments with similar product preferences. Then, we perform geographic segmentation. By applying the approach developed we also identify challenges that affect the segmentation process and results. The suggested approach can serve as a guide to business analysts to understand which are the steps that they should perform when analysing similar datasets. Whereas its results may assist third-party logistics (3PL) companies, retailers, and brands in supporting decision making.

1 Introduction

Retailers generate and analyse a vast amount of data daily. This plethora of data can create growth opportunities, drive innovation, and enable better customer relationship management (Monod et al., Citation2022; Zerbino et al., Citation2018). Despite the large datasets gathered, companies face several challenges in their utilising them for marketing purposes such as attracting customers or providing smart recommendations (Graef et al., Citation2022) and exploiting their economic value in the e-commerce retail context (Ngai & Wu, Citation2022). Customer segmentation is used as the basis for business decisions related to product development and services, and is considered the cornerstone for personalisation and smart customer targeting in the retailing sector (Dekimpe, Citation2020). Understanding the customer segments can increase firm performance (Obitade, Citation2021) and overall builds on the value creation from data analytics (Hopf et al., Citation2022; Mikalef et al., Citation2020; Saggi & Jain, Citation2018; Shea et al., Citation2019). Therefore, retailers invest in building analytics to identify customer segments and support evidence-based customer segment targeting (Graef et al., Citation2022; Pappas et al., Citation2018; Saggi & Jain, Citation2018; Tiwari et al., Citation2018; Vecchio et al., Citation2018). Due to the advent of e-commerce and logistics during the last decade (Graef et al., Citation2022; Jindal et al., Citation2021; Vakulenko et al., Citation2022), focusing on customer segmentation in this sector became more prominent.

Customer segmentation studies perform demographic, psychographic, behavioural, or geographic segmentation by using customer demographic and psychographic, product, or spatial characteristics respectively. Plenty of studies focus on demographic, psychographic, or behavioural segmentation (e.g. Ferretti & Montibeller, Citation2016; Keenan & Jankowski, Citation2019; Y. Li et al., Citation2017; Pick et al., Citation2017; Griva et al., Citation2021; Park et al., Citation2014; Subramanian et al., Citation2014). These segmentation types are sufficient for physical stores. The growth of online purchases during Covid-19 made evident that spatial/location feature should be also considered to segment customers. Apart from segmenting the customers based on their product preferences, online retailers were called to segment them based on their residency to adjust for example their stock and product offering over there. Recent studies have shown that spatial characteristics can enrich customer behaviour insights becoming important for smart customer targeting (Ferretti & Montibeller, Citation2016; Keenan & Jankowski, Citation2019; Y. Li et al., Citation2017; Pick et al., Citation2017). Hence, an emerging research stream on customer segmentation proposes is to exploit spatial characteristics to perform geographic segmentation in the customer context, but still only a few contributions exist (e.g. Fan & Zhang, Citation2009; Kieu et al., Citation2018).

Existing segmentation studies focus on one specific type of segmentation e.g. demographic, behavioural, or geographic. Since customer behaviour is more fluid and is changing over time, the more customer characteristics, we use to understand our customers, the better we understand them (Griva et al., Citation2021). Despite the proven value derived when combining several segmentation types (i.e. demographic, behavioural, geographic; Hunke et al., Citation2021; Kondo & Okubo, Citation2022); studies that combine more than one segmentation type to perform segmentation are limited. In particular, there is notable scarcity of studies combining geographic segmentation with the rest available segmentation types. Therefore, the overarching research objective and question of this study is:

RQ1. How can we combine behavioural and geographic customer segmentation to understand customers?

To address this question, our study introduces a two-stage business analytics approach that leverages product and spatial characteristics to jointly perform behavioural and geographic segmentation. Overall, our approach is based on data mining i.e. clustering, and machine learning (i.e. topic modelling with Latent Dirichlet Allocation, and feature selection techniques. To apply our approach, we used e-commerce home delivery data from various online retailers provided by a third-party logistics (3PL) company.

The paper is organised as follows: Section 2 presents the research background, Section 3 describes our proposed approach, while Section 4 presents the empirical analysis and the results. In closing, we discuss our findings and present the theoretical contributions, practical implications, and limitations in Section 5.

2 Research background

2.1 Customer segmentation studies

Customer segmentation identifies groups of customers with similar characteristics and behaviours and targets them with similar marketing activities (Boone & Roehm, Citation2002). The main customer segmentation types are demographic segmentation which uses demographic characteristics such as age and marital status; behavioural segmentation which uses characteristics related to the purchasing behaviour (e.g. money spent or the products purchased) and the channel used (e.g. online, physical store); psychographic segmentation that uses data such as personality, lifestyle and attitude; and geographic segmentation that uses spatial and location characteristics (Tynan et al., Citation1987; Wu & Yu, Citation2020).

Traditional customer segmentation approaches utilize traditional market research datasets including qualitative data such as customer interviews, and focus groups, and quantitative data such as polls, and questionnaires to derive the customer segments (e.g. Darden & Perreault, Citation1976). Given the plethora of datasets nowadays, contemporary customer segmentation can be also supported by transactional data such as sales, loyalty, website interactions, social media, and customer searches. Thus, various data mining techniques such as artificial neural networks, data classification, and clustering can be used to conduct contemporary segmentation (Fan & Zhang, Citation2009; Griva et al., Citation2018; Ngai et al., Citation2009) with a view to extract patterns and insights about varying business problems (Zhou et al., Citation2020). Given that data characteristics are critical for applying these techniques, a customer segmentation approach can be diversified as follows (Fan & Zhang, Citation2009):

  1. Customer characteristics including demographics, psychographics, channel preferences like web and physical, visit/order behaviour measured by money spent, time spent in the store, last purchase/order, etc. are used to conduct either demographic, or psychographic , or behavioural segmentation. For instance, Bhatnagar and Ghose (Citation2004) perform demographic segmentation using characteristics such as age, marital status, gender, income, and education collected through a consumer online survey. Miguéis et al. (Citation2012) segment the shoppers of a European retailing company based on their lifestyle data performing a psychographic segmentation. More specifically, they conduct a clustering analysis to identify different behavioural segments and correlate them to psychographics such as socialising enjoyment. Taramigkou et al. (Citation2018) via applying clustering and using personality characteristics also identify physiographic segments. Another stream of studies exploits recency frequency, and monetary (RFM) analysis to perform behavioural segmentation. For instance, Tsai and Chiu (Citation2004) apply RFM to sales data. Whereas Wang et al. (Citation2020) perform behavioural segmentation based on RFM and other purchasing behaviour characteristics by using grocery order data. Similarly, Aeron et al. (Citation2012) perform behavioural segmentation based on customer lifetime value.

  2. Product characteristics including product category, brand, product type, product attributes like colour, price, size, weight, etc. are considered to perform behavioural segmentation (Griva, Citation2022). Product characteristics studies either focus on the cross- and multi-channel customer purchases (e.g. De Keyser et al., Citation2015; Konuş et al., Citation2008), or focus on single customer visits purchased by one channel. Regarding the latter, Boone and Roehm (Citation2002) use artificial neural networks to segment customers based on the days since the first and last purchase, the number of orders, and the total dollars spent in a retail store. Likewise, studies use product preferences to perform behavioural segmentation; for instance, using sales data to segment customers into groups considering the products purchased from various retailers (Griva et al., Citation2021, Citation2018).

  3. Spatial characteristics such as addresses, city/country of residence, store location, and population density are considered to perform geographic segmentation. Recent literature (e.g. Ferretti & Montibeller, Citation2016; Keenan & Jankowski, Citation2019; Y. Li et al., Citation2017; Pick et al., Citation2017) admits that spatial characteristics can contribute to customer segmentation and enrich the insights extracted. Geographic segmentation has drawn significant attention since modern management information systems have made plenty of spatial and non-spatial historical data available (Keenan & Jankowski, Citation2019). However, only a few approaches perform geographic segmentation in the customer and retailing context. For instance, spatial data are used to geographically segment fast-food companies (Widaningrum et al., Citation2017), bicycle sharing (Westland et al., Citation2019), or the transit market to reveal passengers’ travel behaviours (Kieu et al., Citation2018).

below summarizes the segmentation characteristics i.e., customer, product, spatial, illustrates some examples for each case, and provides a mapping between the characteristics and the type of segmentation that we can conduct based on them .

Table 1. Mapping between segmentation characteristics and segmentation types.

In existing literature, plenty of studies conduct customer segmentation using one out of the three possible segmentation characteristics. However, there is an increasing need of combining several dissimilar datasets and characteristics to approach contemporary customers and increase customer value (Hunke et al., Citation2021). Some studies perform several segmentation types, but in isolation, as they do not combine the results. For instance, Griva et al. (Citation2021) use sales and loyalty data to perform either behavioural or demographic segmentation. Similarly, Ying Liu et al. (Citation2010) perform and compare two different segmentation models, in the first one they use various values-related attributes such as the number of calls, minutes of use, monthly revenue, etc. to perform behavioural segmentation; and in the second they use characteristics such as age, gender, income, etc. to perform demographic segmentation. Few other studies combine various segmentation characteristics. For example, Nakano and Kondo (Citation2018) use purchase scan panel data, social, media, and survey data to perform multichannel customer segmentation which combines psychographic and demographic characteristics. Darden and Perreault (Citation1976) utilise interview and questionnaire data to perform initially physiographic segmentation based on lifestyle characteristics, and then to perform behavioural segmentation using product characteristics. Similarly, Kondo and Okubo (Citation2022) use demographics and product characteristics as inputs to segment omnichannel shoppers in the grocery retail environment.

Concluding, many studies perform several segmentation types but in isolation while only a limited number combine several segmentation types. Even when they combine several segmentation types, the analysis usually is conducted in one layer/stage, meaning that the various segmentation characteristics are used as inputs in a segmentation model to provide an ultimate segmentation and not distinct results which can then be combined. However, many input variables which are unrelated (e.g. demographic characteristics and product characteristics), may hinder the performance of the data mining algorithms (Kwon & Sim, Citation2013; Raudys & Pikelis, Citation1980) and result in unreliable results. Closing, there is a clear lack of studies combining behavioural and geographic customer segmentation types.

2.2 Research method

We followed the ‘Design Science’ research approach (Hevner et al., Citation2004) which is used widely in IS research (Ju et al. Citation2019) and focuses on the development and assessment of artefacts. Our artefact (Section 3) is a two-stage business analytics approach that utilises data mining i.e. clustering, and machine learning i.e. topic modelling with Latent Dirichlet Allocation (LDA), feature selection techniques to perform jointly behavioural and geographic segmentation. We evaluated our proposed artefact (Section 4) by applying it in practice to prove its sufficiency to manage the original problem i.e. to perform behavioural and geographic customer segmentation. Specifically, we performed segmentation using product delivery data provided by a Greek third-party logistics (3PL) company that distributes orders from various online retailers to customers’ houses.

In the development of our approach, we followed CRISP-DM, which is a cross-industry standard process for data mining (Shearer, Citation2000). To serve our research purpose, we made the necessary adjustments to the CRISP-DM steps and our data-mining process included five steps (see, ): (A) Business and Data Understanding, (B) Feature selection (C) Modelling, (D) Evaluation, and (E) Segmentation. Here we would like to note that our contribution and the originality of this approach is indicated in the steps included in the rectangle marked with red in .

Figure 1. Two-stage business analytics approach for behavioural and geographic customer segmentation.

Figure 1. Two-stage business analytics approach for behavioural and geographic customer segmentation.

3 Behavioural and geographic customer segmentation approach

depicts the proposed customer segmentation approach. To avoid issues related to including many input variables in a model (i.e. both product and spatial characteristics), which may hinder the performance of the analytics models (e.g. Kwon & Sim, Citation2013; Raudys & Pikelis, Citation1980), our suggested approach consists of two stages that exploits product and spatial characteristics succesively to perform behavioural and geographic segmentation.

The input dataset includes customer purchases/orders indicating the products that a customer purchased from one or many retail companies (product characteristics). At the same time to conduct the geographic segmentation, the dataset also includes details on the location that these products were delivered (spatial characteristics). In general, this data can be provided either from one retailer e.g. to analyse only its customers, or from a 3PL company that distributes products from several retailers and has visibility on the products delivered as in our case.

  1. The ‘Behavioral segmentation’ stage extracts behavioural segments by considering the product characteristics i.e. product categories purchased by each customer as indicated by customers’ home deliveries (‘blue’ path of the first rectangle in ).

  2. The ‘Geographic segmentation’ stage extracts geographic segments by enriching the identified behavioural segments of the previous stage with spatial characteristics (‘grey’ path of the second rectangle in ).

Below we describe each sub-step of the proposed approach.

3.1 Feature selection

Feature selection derives the features required as input for the clustering algorithm to extract customer segments (Griva et al., Citation2018). The product item descriptions encode the key data elements required for the feature selection in the case of behavioural segmentation. Extracting features from a product item description in a structured way is difficult. Therefore, we first formed product category levels based on these descriptions and then select the product category levels that will be the features used for our analysis. For example, we extracted a product category named ‘scarfs’ and then a subcategory named ‘woman scarfs’ from a product item description such as ‘GH.ITSJ- 6765 Scrf. WMN. red-blue’ ( provides more examples).

Figure 2. 3PL company’s role in product delivery.

Figure 2. 3PL company’s role in product delivery.

As noted above, the product description text encodes the key data elements describing the product. However, it remains challenging to identify the product categories from the initial item description especially because each retailer uses its own product descriptions and categories. Topic modelling based on LDA can be used to identify the latent topic levels representing the product categories (Asghari et al., Citation2020; Blei et al., Citation2003; Chen et al., Citation2018). By applying the LDA, an initial set of product categories was formed and then, we pruned the weakest categories i.e. the categories including only a small number of products consisting less than 1% of the population. To address the common LDA challenge of ‘exchangeability’ (e.g. words such as ‘women’ sometimes were noted as: ‘woman’, ‘women’, ‘W’, ‘wom’, ‘wmn’), we used text mining based on simple predefined rules to complete any problematic data and improve the quality of the findings (Allahyari et al., Citation2017). Perplexity (Blei et al., Citation2003) in our dataset was unstable. So, we turned our attention towards coherence metrics and used the measure suggested by Arun et al. (Citation2010), which examines the symmetric Kullback–Leibler divergence of the singular value distributions between Topic-Word and Document-Topic matrices, and determine a reasonable number of topics. For setting LDA, we followed the propositions of Wallach et al. (Citation2009), thus an assymetric α Dirichlet prior for documents over topics and a symmetric β Dirichlet prior for topics over words were selected. On the one hand, the α Dirichlet prior was calculated using an optimisation scheme during the Gibbs sampling iterations and on the other hand we set β = 0.05 for all the words. To this end, the role of the asymmetric α Dirichlet prior was to model our belief about how a topic is distributed among the product items (the higher the value the more probable is a document to contain a topic) and allowed to sufficiently uncover a meaningful product category hierarchy according to the assignments of documents over topics.

The above steps result in a product category tree (see for an example) that is likely to be unbalanced due to the lack of complete product information like textile/material, brand, and colour in many product item descriptions and the high frequency of the best-selling products. Unbalanced product category trees could affect the performance of the clustering and the business results (Cho & Kim, Citation2004; Cho et al., Citation2002), as introduce a bias towards the best-selling product categories. To balance this tree, we estimated the number of items classified at every product category level of the product taxonomy and the number of customers who purchased an item belonging to this category level. We then pruned the product taxonomy tree in a bottom-up way eliminating the weakest product category levels (nodes) i.e. those that have been delivered a few times, and then move to the higher product taxonomy level (parent node), and so on (Griva et al., Citation2021, Citation2018). This way we ended up selecting cross- (product) category levels which were used as input in the clustering. Similarly, we selected the spatial levels for the geographic segmentation. This dataset was already organised in a hierarchical structure e.g. county, region, sub-region 1, sub-region 2, and postal code, and was easier to be represented in a tree. Qualitative supervision of the selection process by the researcher and guidance by the industry experts was crucial to merge two or more levels e.g. product categories or spatial levels belonging to the same parent node.

3.2 Modelling

Even after selecting the cross-spatial levels, we still identified variations in the population or else the delivery volume contained in each spatial level. Variation means that the dataset is dispersed, so that one spatial level might have many deliveries, and another might have a few. To identify such variations, we can use several statistical techniques to check the dispersion of the dataset. For instance, a boxplot (A. Li et al.,) can be used to examine the median and identify negative or positive skewness in the dataset, and visually examine the dispersion of the spatial levels. By applying a boxplot technique before the clustering-based segmentation, we can first identify geographic segments based on delivery volume and tackled this issue. Then we can proceed with the clustering.

We utilised clustering with the expectation maximisation (EM) algorithm to conduct behavioural segmentation based on product characteristics. EM was used as we were dealing with categories and EM serves as a partitional algorithm to identify latent classes in categorical data (Dempster et al., Citation1977; Grim, Citation2006). The input clustering dataset is a table, in which each row represents a customer, having as columns all the product categories selected in the feature selection step e.g. ‘scarfs’, ‘gloves’, ‘woman bags’.

Whereas we created a mapping between the spatial levels derived in the feature selection step and the behavioural segments to proceed with the geographic segmentation. In essence, we formed a table/matrix including the identified behavioural segments of the first stage as columns and the spatial levels as rows. Considering the number of home deliveries included at each spatial level, we classified each spatial level and behavioural segment combination into low (L), medium (M), and high (H; ). This matrix was used as the learning dataset of the second clustering model, in which we applied the k-means algorithm. K-means is used as it is one of the most well-known, simple, cost-effective, and easy-to-interpret partitional algorithms trying to identify non-overlapping subgroups (Jain et al., Citation1999; Wu et al., Citation2008).

Table 2. Mapping behavioural segments and spatial levels – input dataset for geographic segmentation.

3.3 Evaluation

We evaluated the product and the geography levels selected, and the segmentation results both from technical and business perspectives as follows.

  • To address the exchangeability issue (e.g. words such as ‘women’ sometimes were noted as: ‘woman’, ‘women’, ‘W’, ‘wom’, ‘wmn’) in the product taxonomy creation since, we used mainly text mining to resolve it and technical evaluation metrics such as precision and recall, to technically evaluate the results.

  • In the case of the unbalanced product taxonomy tree and the unbalanced spatial taxonomy tree, we used a semi-supervised method. We considered the resulting data skewness to conduct the technical assessment and experts’ opinions (i.e., 3PL’s managers with domain expertise) to evaluate our results in business terms. Experts also provided feedback regarding the product and geography taxonomies that resulted in moving up and down the selected features (i.e. products, or spatial levels) on the identified taxonomies. This process can increase the consistency of the clusters in the next step of the proposed approach and lead to better interpretable customer segments. To further validate the derived product categories taxonomy, we examined the website of major retailers and ascertained that their product categories taxonomy is similar to the proposed one.

  • To evaluate the clustering results, we examined the internal validity of the clusters to identify the important clustering components (Jain et al., Citation1999) like the optimal number of clusters and metrics such as separation, similarity, compactness (Y. Liu et al., Citation2013; Zeleny et al., Citation2017). In our case, we used the Silhouette coefficient to determine the optimal number of clusters and then we used Silhouette Index (SI) to examine the clustering internal validation (Griva, Citation2022). We also considered external experts’ opinions. Experts’ business feedback varies from comments regarding the segments to the interpretation of the segments. This feedback is also related to the previous point, as clusters that are not easily interpretable, or omissions in results might indicate a need to revisit the selected product or spatial levels.

4 Approach evaluation – empirical analysis

4.1 Data

We used e-commerce product delivery data provided by a Greek 3PL company, that fulfils and distributes various online retailers’ orders to customers’ houses (see ). The online retailers sell a variety of products including clothing, accessories, home appliances, and electronics.

The dataset covered a three years period (from 2014 to 2017), four online retailers and 496.000 home deliveries with 190.000 product items ordered by 74.000 customers. presents a sample of the e-commerce home delivery dataset; ‘product item’ constitutes the ‘product characteristics, and ‘Postal code’, ‘Sub-region 2’, ‘Sub-region 1’, ‘Region County’ the spatial characteristics. The dataset was suitable for our analysis, as it contained information about the product items required to derive the product categories taxonomy, select product category levels, and thus perform the behavioural segmentation. Also, it included the spatial characteristics and geographic taxonomy required to conduct geographic segmentation. Each customer had a unique identifier, which was essential as the customer is the unit of analysis in our study.

For our case we used SQL Management Studio, SQL Server Analysis Services, and R in our data analysis, while we utilized RapidMiner for text mining and its LDA operator for topic modeling. We developed a web interface built on HTML and D3.js JavaScript library to produce dynamic and interactive data visualizations of the results.

Table 3. Data sample.

4.2 Stage one: Behavioural segmentation

We initially created a product taxonomy tree having four height levels () e.g. ‘accessories’ (Level 1), “scarfs (Level 2), women scarfs (Level 3), ‘ST.FB51-8214 Scarf WOM.100% POLYEST’ (Item-level). also presents the number of categories included in each taxonomy level (e.g. level 1 included 18 product categories, and item level included 190.000 products). To select the cross-category levels (features), we considered the number of items classified in each level and the number of customers who purchased this category level. We also empirically consulted the company’s analysts that are familiar with the domain. We ended up selecting 112 cross-category levels. presents an example of the cross-level categories selected where N represents the number of items per tree level, and shaded nodes represent the cross-level categories selected. For instance, from the branch ‘accessories’, we used ‘scarfs’ and ‘gloves’. From the branch ‘bags’, we used ‘man bags’, ‘backpacks’, ‘woman shoulder bags’ and ‘woman clutch bags’ (shaded categories of ).

Figure 3. Product taxonomy tree.

Figure 3. Product taxonomy tree.

We applied the EM clustering algorithm considering the SI as a clustering evaluation metric. SI receives inputs from −1 to 1, and the highest its value is the more separated the clusters are (Ulkhaq & Adyatama, Citation2021). In our case, SI was 0.711, which is considered highand led to the identification of seven behavioural customer segments (). Each customer and their home delivery belonged exclusively to one segment while each segment contained various product categories that might belong to various segments. Each bubble presents the percentage of deliveries (named as ‘orders’ in ) contained in this segment. By hovering over each segment, details about the percentage of customers (named as ‘shoppers’ in ) belonging to this segment, and the average variety of product categories (named as ‘Avg Shopper Variety’ in ) are presented. We named the behavioural segments considering the product categories each segment contained as indicated by the ‘frequency of appearance’ and statistical significance metrics for each product category. The ‘frequency of appearance’ metric corresponds to the percentage of customers that belong to each segment and ordered the respective product category. Whereas the ‘statistical significance’ shows whether the existence of a product category in a segment is a result by chance.

Figure 4. Behavioural customer segments as communicated to the 3PL company (See, for details per segment).

Figure 4. Behavioural customer segments as communicated to the 3PL company (See, Table 4 for details per segment).

For example, most of the categories in Segment 5 () included ‘Home equipment, appliances, and beauty’ products. Following the same way of thinking, Segment 1 mainly contained product categories like bags, wallets, backpacks, women’s night bags, shoes, women’s boots, women’s sandals, jewellery, slippers, packaging, and travelling equipment, represented customers that purchased ‘women accessories, packaging, and traveling equipment’. Alike, Segment 2 contained dominant categories, such as blouses, dresses, cardigans, women’s pants, skirts, knit clothing, coats, and women’s boots representing customers who ordered ‘women winter clothing’. Closing, Segment 6 contains blouses, pants, knit clothing, leggings and tights, underwear, swimwear, jacket, pyjamas for both men and women, dresses, and shirts. Hence, these customers ordered clothing from various product categories probably as a couple or family.

Figure 5. ‘Home equipment, appliances, and beauty’ behavioural segment ..

Figure 5. ‘Home equipment, appliances, and beauty’ behavioural segment ..

For each segment, we calculated the percentage of orders/home deliveries, the percentage of customers, the product categories variety i.e. the average number of product categories per home delivery, and the average number of ordered items within the 3-year delivery history (). The descriptive statistics for the resulting segments enhanced our understanding of the behavioural segments.

Table 4. Descriptive statistics of behavioural customer segments.

4.3 Stage two: Geographic segmentation

The spatial taxonomy of the 3PL company had five levels i.e. county, region, sub-region 1, sub-region 2, and postal code (). The lowest spatial level included 1.029 postal codes, whereas the highest included 14 counties. First, we calculated the home deliveries for all the spatial levels. Some spatial levels gathered many home deliveries, and others a few or zero creating a skewed dataset. Due to the skewness of the dataset, we followed Discrete Gaussian Exponential (DGX) distribution (Bi et al., Citation2001) to select the cross-spatial levels (see shaded nodes of ). Following a bottom-up approach considering the number of home deliveries per spatial level , DGX suggested the spatial levels. When the method did not suggest an existing spatial level, we created an intermediate spatial level by merging existing spatial levels (see dashed nodes of ). .Qualitative supervision, and intervention by the researcher was required to support the semi-supervised method of selecting spatial levels and merging them when required. Overall, we formulated 316 cross-level spatial categories or regions.

Figure 6. Spatial taxonomy.

Figure 6. Spatial taxonomy.

Still, we identified variations in the number of deliveries contained at each spatial level. We identified these variations using a boxplot. In ,we depict the boxplot that presents the delivery volume variation by utilising the spatial levels. Each dot represents the selected spatial level, and the y-axis represents the number of deliveries. Since the median of the boxplot is closer to the bottom, we inferredthat dataset has a positive skewness while the visual formation of the boxplot indicated a stretched dispersion. Based on this boxplot, we then formulated four geographic segments. As such from the min value till Q1 we have the first segment, the IQR is the second segment, from Q3 to max value we have the third segment, and the upper outliers are the fourth segment.

Figure 7. Geographic segmentation results based on delivery volume.

Figure 7. Geographic segmentation results based on delivery volume.

For each geographic segment i.e. 1 to 4 in , we then applied the k-means clustering algorithm. SI was used as an evaluation metric here as well. We identified four clusters in segment 1 and 3 in segment 3. SI was 0.673 and 0.702 consequently. A, B, C, and D () declare the sub-segments extracted. No clusters were identified in segment 1 and segment 4. Segment 1 included only 81 islands and non-urban regions having less than 180 deliveries during the 3-year home delivery history. As such company’s managers consulted us to focus on the rest segments that included a higher number of home deliveries. Segment 4 (dark green) included the boxplot outliers and contained only 17 postal codes mainly from Athens and Thessaloniki, the two largest cities in Greece. Since the regions contained in this segment had orders from all the identified behavioural customer segments, the clustering algorithm did not identify any sub-segments.

Segment 3 included three geographic sub-segments with 60 spatial levels/regions. We depicted these segments with light green in . presents the three sub-segments and the regions that constitute each one of them.

Figure 8. Geographic segmentation results for Segment 3 – spatial view (map of Greece).

Figure 8. Geographic segmentation results for Segment 3 – spatial view (map of Greece).

presents the participation High (H), M (Medium), Low (L) of each behavioural segment in each geographic segment. Considering the behavioural segments with high participation in the geographic segments, we can understand the product preferences of the population of this segment. For instance, sub-segment 3A was rated ‘high’ in behavioural segments that contained accessories (‘woman accessories, packaging and traveling’ and ‘women shoes and accessories’ segments) and ‘men and women underwear, swimming and baby clothing’. This sub-segment contained 21 non-urban regions including many islands, which could explain the preference of the population in accessories, travelling, and swimming products.

Table 5. Geographic segmentation results for segment 3 (L = low, M = medium, H = high).

Sub-segment 3B contained in high rate ‘home equipment, appliances and beauty’ and ‘men and women accessories and professional clothing’ product categories. This seems logical as this segment contains 19 regions including mostly postal codes belonging to, or being close to, the Attica region or other large Greek cities such as Thessaloniki and Volos, where most of the Greek population lives. Sub-segment 3C included 20 regions from all over Greece excluding islands and was characterised by a low delivery rate in most of the identified behavioural segments. The ‘Couple clothing’ and ‘women’s winter clothing’ customer segments had medium and high rates, respectively. The low temperatures recorded in some of the regions of this sub-segment could be the interpretation of the high rate of the latter.

We identified four sub-segments in Segment 2 () including 156 regions. We depicted these segments with light orange in . Sub-segment 2A contained 38 regions, 27 of them in Attica and only 11 distributed all-around Greece and was characterised with a high rating in all behavioural segments. On the contrary, sub-segment 2B contained 42 regions, from all over Greece and only 2 from Attica, and was characterised by low ratings in all behavioural segments. The residents of the regions included in this segment are low-spenders. Sub-segment 2C contained 43 regions from all over Greece and was rated with a medium in all except the ‘home equipment, appliances, and beauty’ behavioural segment, which was rated low. This geographic segment did not order many home products in comparison to the whole population. Sub-segment 2D contained 33 regions and was characterised by a high rate in ‘women’s winter clothing’ and ‘women’s shoes and accessories’, ‘women’s accessories, packaging, and travelling’ behavioural segments. This segment mainly included regions in highland Greece, where temperatures are low.

Table 6. Geographic segmentation results for segment 2.

5 Discussion and conclusion

Our study performs behaviourally and geographically informed customer segmentation by exploiting products and spatial characteristics and introduces a two-stage business analytics approach to achieve it. . We applied it in the case of online retailing by using e-commerce home delivery data provided by a 3PL company. The 3PL company fulfils and distributes online retailers’ orders and the data provided include both product i.e. the product categories ordered by each customer, and spatial characteristics i.e. home delivery postal codes and regions. Our results indicated eleven behavioural-geographic segments in Greece. In the first stage of the proposed approach, we segmented customers based on the products ordered and seven distinct behavioural customer segments with similar product preferences were derived. In the second stage, we used the behavioural segments and the spatial characteristics as input to perform geographic segmentation based on the delivery location and volume. The results indicated four geographic segments. Delving deeper into each geographic segment, we further identified seven geographic sub-segments that included customers with similarities on the products ordered based on their ratings at the seven behavioural customer segments. The application of the approach revealed issues and challenges (e.g., skewed data, inaccurate product descriptions, unbalanced product taxonomy and geographic taxonomies trees) that affect the segmentation process and results.

5.1 Theoretical contribution

In the existing literature, there are various studies that segment customers based on several segmentation types (i.e. demographic, psychographic, behavioural, geographic; e.g. Griva et al., Citation2021; Ying Liu et al., Citation2010). However, this usually happens in isolation, as they do not combine the segmentation results. There are also some studies which are combining several segmentation types (e.g. Darden & Perreault, Citation1976; Kondo & Okubo, Citation2022; Nakano & Kondo, Citation2018). Even in these cases, the analysis is usually conducted in one layer/stage, meaning that the various segmentation characteristics are used as inputs in one segmentation model to provide an ultimate segmentation and not distinct results which can be then combined. However, research has indicated that many input variables or features in a model which are unrelated (e.g. demographic characteristics and product characteristics), may hinder the performance of the data mining algorithms (Kwon & Sim, Citation2013; Raudys & Pikelis, Citation1980), and may lead to unreliable results. At the same time, previous studies performed either geographic (Ferretti & Montibeller, Citation2016; Keenan & Jankowski, Citation2019; Y. Li et al., Citation2017; Pick et al., Citation2017) or behavioural (Griva et al., Citation2021; Park et al., Citation2014; Subramanian et al., Citation2014; Wang et al., Citation2020) customer segmentation; however, there is lack of studies combining behavioural and geographic customer segmentation types. In this study, we combine behavioural and geographic segmentation, we do not conduct both of these in isolation, or via providing analysis in one layer. First, we perform the behavioural segmentation and then we combine its results with spatial characteristics to conduct geographic segmentation. This way we interweave behavioural and geographic segmentation as we perform customer segmentation in two stages and provide two different sets of segments which can be combined to better understand the customers. As such we avoid issues related to including many input variables in a model (i.e. both product and spatial characteristics), which may hinder the performance of the analytics models (Kwon & Sim, Citation2013; Raudys & Pikelis, Citation1980).

Further, our study contributes to the segmentation literature by developing a two-stage business analytics approach that utilises data mining (i,e. clustering), and machine learning (i.e. topic modelling with Latent Dirichlet Allocation and feature selection) techniques to perform jointly behavioural and geographic segmentation. This way, the proposed approach responds to recent calls from IS researchers for developing artefacts, approaches, methods, and frameworks that indicate and present precisely the steps required for analysing given datasets to create new methods (e.g. Machine Learning) and extract business knowledge (Abbasi et al., Citation2016; Agarwal & Dhar, Citation2014; Griva et al., Citation2021; Padmanabhan et al., Citation2022).

Additionally, the literature pinpoints that various aspects and factors may affect the segmentation process and the quality of the segmentation results (Cho & Kim, Citation2004; Griva et al., Citation2021, Citation2018; Kim et al., Citation2002). In this respect, our study considers the challenges discussed in the segmentation literature and provides answers to segmentation problematisations by applying our approach to the e-commerce home delivery dataset. Some studies pinpoint factors affecting behavioural segmentation (e.g. Cho & Kim, Citation2004; Griva et al., Citation2021); however, in geographic segmentation, there is a notable scarcity of studies examining factors affecting geographic segmentation results. Ιn line with existing literature, we identified and tackled major issues in behavioural segmentation such as the issues related to inaccurate product descriptions and the lack of a unified product taxonomy. On top , we identified key challenges on treating geographies with variations in population or selecting the appropriate spatial level that we discuss more thoroughly below.

Challenge I: Selectiong the appopriate spatial level (i.e. state, city, region, sub-region, postal code) in the geography hierarchy.

In line with the literature (Cho & Kim, Citation2004; Griva et al., Citation2021, Citation2018; Kim et al., Citation2002), our study confirms that issues related to the product and spatial taxonomy trees like unbalanced trees causing data skewness can affect the analysis results. Very exhaustive or very generic selections of the spatial level may lead to insufficient results and limited evidence for the decision-support process (Trivedi, Citation2011). Selecting appropriate levels in segmentation is an issue that has been mainly raised in data mining studies that deal with product hierarchies (Albadvi & Shahbazi, Citation2009; Cho & Kim, Citation2004; Cil, Citation2012; Griva et al., Citation2021). Although this selection may hinder the data mining results also in the case of geographic segemntation, there is a lack of methods and techniques to aid both business analysts and researchers in correctly determining a spatial level. To overcome this issue, we suggested the utilisation of an hybrid geography segmentation model that considers oth geometric and non-geometric criteria to perform the segmentation. In more detail, we processed the initial geographic taxonomy tree and merged or split, or created new spatial levels based on the delivery volumes. By performing the merging or splitting operations and by normalising the deliveries at each spatial level, we selected a spatial level to perform the analysis. To select the spatial levels, we followed a semi-supervised method. In line with existing literature (e.g. Van den Broek et al., Citation2021; Zhang et al., Citation2021), we confirm that involving the human factor i.e. domain experts into the in the analytics methods is important. In our case, in essence the ‘human’ was able to negotiate with the algorithm via suggesting new spatial levels, and then the algorithm was re-executing the process and was making new suggestions under the supervision of the experts. Without the ‘negotiation’ between the human and the algorithm, the selection of the spatial levels would not be feasible.

Challenge II: Defining the criteria to perform the geographic segmentation

To segment customers based on geographic characteristics, we can (A) use geometric criteria (i.e. group nearby areas); (B) non-geometric criteria (i.e. combine non-nearby locations) based on selected criteria like common behaviours, sales, revenue; (C) or mixed/hybrid models that combine geometric and non-geometric dimensions (Idrees et al., Citation2018). In current literature, researchers usually utilise geometric criteria to perform geographic segmentation (Fan & Zhang, Citation2009; Idrees et al., Citation2018; Mo et al., Citation2010). So, they group solely nearby areas, and they neglect the fact that nearby geographies might have significant differences in behaviours. Our results indicated that customers’ discrimination was amplified by the co-existence of the product and spatial dimension. The discrimination was related to both the varying delivery volume of the geographic segments and the different customers’ behaviour in the various areas. For example, we observed differences in the extracted geographic segments even in nearby regions belonging to the same county. To tackle such issues, we adopted a hybrid geography segmentation model that used both geometric areas i.e. nearby areas and non-geometric areas i.e. areas that are not close to each other but presented similarities in their population’s behaviour providing.

Challenge III: Defining the criteria to segment geographies with variations in population.

The population i.e. the number of objects that belong to each spatial levelcan affect the quality of the analysis (Gunter & Furnham, Citation1992; Pickton & Broderick, Citation2005). For example, the large number of customers, sales, and deliveries can lead to significant differences within some segments (. By creating new spatial levels based on the delivery volume, we formulated a more balanced spatial tree supporting the segmentation of geographies with variations in population. Additionally, techniques like boxplot were used to visually examine the variations of the spatial levels, and make a first attempt of geographic segmentation based on delivery volume.

In sum, by applying our approach in the case of online retailing and home delivery, we underlined the key challenges of geographic segmentation that obsess segmentation system designers, researchers, and business analysts. This is a significant contribution, since in existing literature there is no discussion on which are the issues that hinder the performance of geographic segmentation approaches, and then which are the solutions suggested.

Closing in this study, we used home delivery data for demand-analytics purposes (i.e. to perform behavioural customer segmentation) while such datasets were utilised mainly for supply-side analytics and logistics purposes (e.g. vehicle routing; Liu et al., 2017) and warehouse management (Lee et al., 2018). More specifically, we showed that we can perform behavioural segmentation using delivery data that have solely product item descriptions; on the contrary, existing studies perform behavioural segmentation utilising sales and transactional data (e.g. Cil, Citation2012; Griva et al., Citation2021; Miguéis et al., Citation2012; Park et al., Citation2014).

5.2 Practical implications

The practical value of this study is stressed when considering the operational and customer-oriented business decisions that behavioural and geographic customer segmentation can support. First, let’s focus solely on the practical implications of behavioural segmentation. Our findings indicated that the customer base of the 3PL company and the four online retailers can be segmented into seven behavioural customer segments, and each one contains customers ordering specific products. As such, product, category, and brand managers of the online retailers can use these findings to identify the customer behavioural segments and the personas they need to promote their product to (Hannila et al., Citation2020). This is very important since brands regularly lack information and insights on which customer persona they need to approach. Due to the lack of raw data, they base their targeted marketing on qualitative market research rather than on hard quantitative evidence. For instance, as shown in, if a brand which is selling face care products would like to advertise a new product, or to increase sales, behavioural segment 5 could be a potential segment to target. This means that customers purchasing home equipment and products (e.g. home decoration), are those who will have a high chance of purchasing face care.

At the same time, marketers and advertisers can also use these results to design bundle promotions based on the products that belong to the same behavioural customer segment (Gu et al., Citation2020) e.g. special offers for those purchase together home decoration and facecare. These strategies and actions are important to build customer relations via retaining and attracting new customers. As such, a rule-based approach can be easily integrated with contemporary CRM systems (Arora et al., Citation2021).

On the practical value of geographic segmentation, segmenting a whole country i.e. Greece into eleven behavioural-geographic segments can contribute to the design of the 3PLs company’s operational and marketing actions. Marketers in the 3PL companies and online retailers can do so by exploiting geographies with common needs and customers’ behaviours for geo-marketing. For instance, subsegment 3A contains 21 non-urban regions including many islands. In this sub-segment, customers prefer accessories, travelling, and swimming products. So, marketers can create the same targeted advertisements for all the regions contained in this geographic segment as customer data indicated similar customer behaviours.

Moreover, we identified distinct behaviours in rural and urban regions considering the geographic segments. Indicatively, in sub-segment 3B that contains mainly areas in Attica (where the capital of Greece is located), customers in these areas prefer home equipment, beauty, and professional clothing products. Whereas in segment 3C which contains rural areas, customers prefer winter clothing and casual clothing clothes for couples. These distinct behaviours and the comparison of rural and urban segments in an e-commerce environment are also important for practitioners such as logistics service providers, 3PLs, and retailers (Vakulenko et al., Citation2022). For instance, it is pointless to spend money to run sponsored adverts on social media to advertise beauty products or professional clothing products to customers living in specific rural areas.

Logistics managers can also exploit the extracted behavioural customer segments to improve their operations. For example, they can rearrange the inventory layout of their central warehouses, by placing products matching their deliveries in nearby aisles and decrease order-picking time or distribute their inventory in the different warehouses considering the geographical distribution of products. For instance, they can place nearby aisles in their warehouse face care and home decoration products as indicated in behavioural segment 5.

These insights and the suggested approach can empower the collaboration between 3PL companies and their B2B customers (i.e. online retailers) and drive the co-creation of business value. The 3PL company can exploit the insights and sell them to retailers. The commercial exploitation of the insights can be conducted by building an Analytics as a Service platform to provide real-time insights to retailers regarding their customer segments and enable anonymised industry wide comparison. Retail marketers and category managers can use these insights to compare for instance, the penetration of the facecare category of brand X in segment 5, versus the penetration of the same facecare category of brand Y and use this information to draw corrective actions.

Although brands are not an obvious stakeholder in this study, the insights may also be beneficial for them to deep dive and view more information about the products. For instance, they can identify which other product categories are purchased with theirs, so that they can cooperate with these brands and design cross-selling actions. In the same spirit, by having a customer-based and geography-based benchmarking, retailers’ marketing managers can build online and offline product catalogues for the specific behavioural segments (e.g. ‘home equipment, appliances, and beauty’) and exploit the geographic segmentation to determine the areas where they will disseminate these catalogues or target advertising.

Closing, on a broader level, this study can also serve as a guide to business analysts to understand the steps that they should perform when analysing similar datasets, the challenges that they will face, and some potential solutions on how to tackle them.

5.3 Limitations and future research

The study contains several limitations from technical, IS and data perspectives that can be part of future research. From a technical perspective, we can use alternative data mining techniques to perform geographic segmentation (e.g. network analytics) (C. Chen et al., Citation2012) and compare their results. In network analytics, locations are treated as graph nodes, and graph mining techniques are used to perform link mining and detect community detection. By using network analytics and graph mining, we can detect more detailed geographic segments and neighbourhoods that were a limitation of our study (e.g. in segment 4 of our analysis where we were unable to identify any clusters). Other techniques such as mathematical modelling can improve and make more robust the spatial level selection and the identification of the product levels. Currently, the evaluation of the spatial and product levels is semi-supervised and is based on the human factor e.g. experts’ prior experience, comparison with major retailers’ websites, etc.. Future research may either focus on developing more robust technical procedures to eliminate any potential human error, or on assessing the contribution of human factor and the negotiation with the algorithm on generating optimal results. Closing, dynamic segmentation of the e-commerce data, and classification of new orders can be also examined in similar cases (Khalemsky & Gelbard, Citation2020).

In our study, we provide the two-stage segmentation approach as an artefact and we list the challenges identified while evaluating it that could hinder the performance of the segmentation. However, we do not provide specific guidelines and design principles around the design of such systems. From an IS perspective, future research may focus on the design aspect of customer segmentation systems by deriving design principles for such segmentation systems, or on focusing on how to visualise the insights generated,

(Markus et al., Citation2002), or on the adoption of the customer analytics systems (Baig et al., Citation2019; Verma et al., Citation2018) and its evaluation using datasets from more cases. In addition, future studies may focus on the skills required for the stakeholders involved in such analysis. For instance, business analysts, and domain experts provide feedback on the semi-supervised methods, marketers convert the insights into actions, designers design the visualisation of the systems, storytellers present the insights to the businesses, engineers handle the databases and the data structures, etc. Concluding, future research could compare this method, with other alternatives to examine the value of this two-stage segmentation approach versus others.

From a data perspective, extra datasets from more online retailers and open data sources can be used to enrich our analysis results and enhance the value of the identified segments (e.g. Internet of Things data, and other open datasets such as weather, income, cultural data (Griva et al., Citation2016). Future research can examine the validity of the extracted segments from a marketing perspective, e.g. by offering real promotions to the customers and measuring their response to the specific actions and evaluating the impact of these insights on business value).

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

Data not available due to legal restrictions

Additional information

Funding

This work was partially funded by by the Horizon 2020 [project Transforming Transport, Grant agreement ID: 731932]; Science Foundation Ireland [Science Foundation Ireland grant 13/RC/2094_2] and Action 2: Support for Postdoctoral Researchers – Year 2017 -2018 – AUEB.

References

  • Abbasi, A., Sarker, S., & Chiang, R.H.L. (2016). Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2), 1–32. https://doi.org/10.17705/1jais.00423
  • Aeron, H., Kumar, A., & Moorthy, J. (2012). Data mining framework for customer lifetime value-based segmentation. Journal of Database Marketing and Customer Strategy Management, 19(1), 17–30. https://doi.org/10.1057/dbm.2012.1
  • Agarwal, R., & Dhar, V. (2014). Editorial —big data, data science, and analytics: The opportunity and challenge for IS research. Information Systems Research, 25(3), 443–448. https://doi.org/10.1287/isre.2014.0546
  • Albadvi, A., & Shahbazi, M. (2009). A hybrid recommendation technique based on product category attributes. Expert Systems with Applications, 36(9), 11480–11488. https://doi.org/10.1016/j.eswa.2009.03.046
  • Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. In KDD Bigdas 2017.
  • Arora, L., Singh, P., Bhatt, V., & Sharma, B. (2021). Understanding and managing customer engagement through social customer relationship management. Journal of Decision Systems, 30(2–3), 215–234. https://doi.org/10.1080/12460125.2021.1881272
  • Arun, R., Suresh, V., Veni Madhavan, C.E., & Narasimha Murthy, M.N. (2010). On finding the natural number of topics with latent dirichlet allocation: Some observations. In M. J. Zaki, J. X. Yu, B. Ravindran, & V. Pudi Eds. Advances in knowledge discovery and data mining. PAKDD 2010. Lecture notes in computer science, Vol. 6118, 391–402; Springer. https://doi.org/10.1007/978-3-642-13657-3_43
  • Asghari, M., Sierra-Sosa, D., & Elmaghraby, A.S. (2020). A topic modeling framework for spatio-temporal information management. Information Processing & Management, 57(6), 102340. https://doi.org/10.1016/j.ipm.2020.102340
  • Baig, M.I., Shuib, L., & Yadegaridehkordi, E. (2019). Big data adoption: State of the art and research challenges. Information Processing & Management, 56(6. https://doi.org/10.1016/j.ipm.2019.102095
  • Bhatnagar, A., & Ghose, S. (2004). A latent class segmentation analysis of e-shoppers. Journal of Business Research, 57(7), 758–767. https://doi.org/10.1016/S0148-2963(02)00357-0
  • Bi, Z., Faloutsos, C., & Korn, F. (2001). The “DGX” distribution for mining massive, skewed data. Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining - KDD ’01, 17–26. https://doi.org/10.1145/502512.502521
  • Blei, D.M., Ng, A.Y., & Jordan, M.T. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(2003), 993–1022. https://doi.org/10.5555/944919.944937
  • Boone, D.S., & Roehm, M. (2002). Retail segmentation using artificial neural networks. International Journal of Research in Marketing, 19(3), 287–301. https://doi.org/10.1016/S0167-8116(02)00080-0
  • Chen, C., Chiang, C., & Storey, S. (2012). Business intelligence and analytics: From big data to big impact. MIS Quarterly, 36(4), 1165–1188. https://doi.org/10.2307/41703503
  • Chen, R., Zheng, Y., Xu, W., Liu, M., & Wang, J. (2018). Secondhand seller reputation in online markets: A text analytics framework. Decision Support Systems, 108, 96–106. https://doi.org/10.1016/j.dss.2018.02.008
  • Cho, Y.H., & Kim, J.K. (2004). Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Systems with Applications, 26(2), 233–246. https://doi.org/10.1016/S0957-4174(03)00138-6
  • Cho, Y.H., Kim, J.K., & Kimb, S.H. (2002). A personalized recommender system based on web usage mining and decision tree induction. Expert Systems with Applications, 23(3), 329–342. https://doi.org/10.1016/S0957-4174(02)00052-0
  • Cil, I. (2012). Consumption universes based supermarket layout through association rule mining and multidimensional scaling. Expert Systems with Applications, 39(10), 8611–8625. https://doi.org/10.1016/j.eswa.2012.01.192
  • Darden, W.R., & Perreault, W.D., Jr. (1976). Identifying interurban shoppers: Multiproduct purchase patterns and segmentation profiles. Journal of Marketing Research, 13(1), 51–60. https://doi.org/10.2307/3150901
  • De Keyser, A., Schepers, J., & Konuş, U. (2015). Multichannel customer segmentation: Does the after-sales channel matter? A replication and extension. International Journal of Research in Marketing, 32(4), 453–456. https://doi.org/10.1016/j.ijresmar.2015.09.005
  • Dekimpe, M.G. (2020). Retailing and retailing research in the age of big data analytics. International Journal of Research in Marketing, 37(1), 3–14. https://doi.org/10.1016/j.ijresmar.2019.09.001
  • Dempster, A.P., Laird, N.M., & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm . Journal of the Royal Statistical Society: Series B (Methodological), 39. https://doi.org/10.1111/j.2517-6161.1977.tb01600.x
  • Fan, B., & Zhang, P. (2009). Spatially enabled customer segmentation using a data classification method with uncertain predicates. Decision Support Systems, 47(4), 343–353. https://doi.org/10.1016/j.dss.2009.03.002
  • Ferretti, V., & Montibeller, G. (2016). Key challenges and meta-choices in designing and applying multi-criteria spatial decision support systems. Decision Support Systems, 84(2016), 41–52. https://doi.org/10.1016/j.dss.2016.01.005
  • Graef, R., Klier, M., Obermeier, A., & Zolitschka, J.F. (2022). What to buy, pepper?–Bridging the physical and the digital world with recommendations from humanoid robots. Journal of Decision Systems, 1–27. https://doi.org/10.1080/12460125.2022.2029049
  • Grim, J. (2006). EM cluster analysis for categorical data. Lecture notes in computer science (Including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), 4109 LNCS(1), 640–648. https://doi.org/10.1007/11815921_70
  • Griva, A. (2022). “I can get no e-satisfaction”. What analytics say? Evidence using satisfaction data from e-commerce. Journal of Retailing and Consumer Services, 66(July 2021), 102954. https://doi.org/10.1016/j.jretconser.2022.102954
  • Griva, A., Bardaki, C., Pramatari, K., & Doukidis, G. (2016). Mapping moving object events into a network of object flows to support decisions. 24th European conference on information systems, ECIS 2016. http://aisel.aisnet.org/ecis2016_rp/112
  • Griva, A., Bardaki, C., Pramatari, K., & Doukidis, G. (2021). Factors affecting customer analytics: evidence from three retail cases. Information Systems Frontiers, 1–24. https://doi.org/10.1007/s10796-020-10098-1
  • Griva, A., Bardaki, C., Pramatari, K., & Papakiriakopoulos, D. (2018). Retail business analytics: Customer visit segmentation using market basket data. Expert Systems with Applications, 100(2018), 1–16. https://doi.org/10.1016/j.eswa.2018.01.029
  • Gu, X., Gao, F., Tan, M., & Peng, P. (2020). Fashion analysis and understanding with artificial intelligence. Information Processing & Management, 57(5), 102276. https://doi.org/10.1016/j.ipm.2020.102276
  • Gunter, B., & Furnham, A. (1992). Consumer profiles: Introduction to psychographics (consumer research & policy series). Cengage Learning EMEA.
  • Hannila, H., Kuula, S., Harkonen, J., & Haapasalo, H. (2020). Digitalisation of a company decision-making system: A concept for data-driven and fact-based product portfolio management. Journal of Decision Systems, 31(3), 258–279. https://doi.org/10.1080/12460125.2020.1829386
  • Hevner, A.R., March, S.T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS Quartely, 28(1), 75–105. https://doi.org/10.2307/25148625
  • Hopf, K., Weigert, A., & Staake, T. (2022). Value creation from analytics with limited data: A case study on the retailing of durable consumer goods. Journal of Decision Systems, 1–37. https://doi.org/10.1080/12460125.2022.2059172
  • Hunke, F., Heinz, D., & Satzger, G. (2021). Creating customer value from data: Foundations and archetypes of analytics-based services. Electronic Markets, (123456789). https://doi.org/10.1007/s12525-021-00506-y
  • Idrees, A.M., Ibrahim, M.H., & El Seddawy, A.I. (2018). Applying spatial intelligence for decision support systems. Future Computing and Informatics Journal, 3(2), 384–390. https://doi.org/10.1016/j.fcij.2018.11.001
  • Jain, A.K., Murty, M.N., & Flynn, P.J. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323. https://doi.org/10.1145/331499.331504
  • Jindal, R.P., Gauri, D.K., Li, W., & Ma, Y. (2021). Omnichannel battle between Amazon and Walmart: Is the focus on delivery the best strategy? Journal of Business Research, 122(August 2020), 270–280. https://doi.org/10.1016/j.jbusres.2020.08.053
  • Ju, J., Liu, L., & Feng, Y. (2019). Design of an O2O citizen participation ecosystem for sustainable governance. Information Systems Frontiers, Information Systems Frontiers, 21, 605–620. https://doi.org/10.1007/s10796-019-09910-4
  • Keenan, P.B., & Jankowski, P. (2019). Spatial decision support systems: Three decades on. Decision Support Systems, 116(2019), 64–76. https://doi.org/10.1016/j.dss.2018.10.010
  • Khalemsky, A., & Gelbard, R. (2020). A dynamic classification unit for online segmentation of big data via small data buffers. Decision Support Systems, 128(August 2019), 113157. https://doi.org/10.1016/j.dss.2019.113157
  • Kieu, L.M., Ou, Y., & Cai, C. (2018). Large-scale transit market segmentation with spatial-behavioural features. Transportation Research Part C: Emerging Technologies, 90(2018), 97–113. https://doi.org/10.1016/j.trc.2018.03.003
  • Kim, J.K., Cho, Y.H., Kim, W.J., Kim, J.R., & Suh, J.H. (2002). A personalized recommendation procedure for Internet shopping support. Electronic Commerce Research and Applications, 1(2002), 301–313. https://doi.org/10.1016/S1567-4223(02)00022-4
  • Kondo, F.N., & Okubo, T. (2022). Understanding multi-channel consumer behavior: A comparison between segmentations of multi-channel purchases by product category and overall products. Journal of Retailing and Consumer Services, 64(February 2021), 102792. https://doi.org/10.1016/j.jretconser.2021.102792
  • Konuş, U., Verhoef, P.C., & Neslin, S.A. (2008). Multichannel shopper segments and their covariates. Journal of Retailing, 84(4), 398–413. https://doi.org/10.1016/j.jretai.2008.09.002
  • Kwon, O., & Sim, J.M. (2013). Effects of data set features on the performances of classification algorithms. Expert Systems with Applications, 40(5), 1847–1857. https://doi.org/10.1016/j.eswa.2012.09.017
  • Li, A., Feng, M., Li, Y., & Liu, Z. (2016). Application of outlier mining in insider identification based on boxplot method. Procedia Computer Science, 91(2016), 245–251. https://doi.org/10.1016/j.procs.2016.07.069
  • Liu, Y., Li, Z., Xiong, H., Gao, X., Wu, J., & Wu, S. (2013). Understanding and Enhancement of Internal Clustering Validation Measures. IEEE Transactions on Cybernetics, 43(3), 982–994. https://doi.org/10.1109/TSMCB.2012.2220543
  • Liu, Y., Ram, S., Lusch, R.F., & Brusco, M. (2010). Multicriterion market segmentation : A new model, implementation, and evaluation. Marketing Science, 29(5), 880–894. http://www.jstor.org/stable/40864671
  • Li, Y., Vo, A., Randhawa, M., & Fick, G. (2017). Designing utilization-based spatial healthcare accessibility decision support systems: A case of a regional health plan. Decision Support Systems, 99(2017), 51–63. https://doi.org/10.1016/j.dss.2017.05.011
  • Markus, M.L., Majchrzak, A., & Gasser, L. (2002). A Design Theory for Systems That Support Emergent Knowledge Processes. MIS Quarterly, 26(3), 179–212. https://www.jstor.org/stable/4132330
  • Miguéis, V.L., Camanho, A.S., & Falcão E Cunha, J. (2012). Customer data mining for lifestyle segmentation. Expert Systems with Applications, 39(10), 9359–9366. https://doi.org/10.1016/j.eswa.2012.02.133
  • Mikalef, P., Pappas, I.O., Krogstie, J., & Pavlou, P.A. (2020). Big data and business analytics: A research agenda for realizing business value. Information and Management, 57(1), 103237. https://doi.org/10.1016/j.im.2019.103237
  • Mo, J., Kiang, M.Y., Zou, P., & Li, Y. (2010). A two-stage clustering approach for multi-region segmentation. Expert Systems with Applications, 37(10), 7120–7131. https://doi.org/10.1016/j.eswa.2010.03.003
  • Monod, E., Lissillour, R., Köster, A., & Jiayin, Q. (2022). Does AI control or support ? Power shifts after AI system implementation in customer relationship management implementation in customer relationship management. Journal of Decision Systems, 1–24. https://doi.org/10.1080/12460125.2022.2066051
  • Nakano, S., & Kondo, F.N. (2018). Customer segmentation with purchase channels and media touchpoints using single source panel data. Journal of Retailing and Consumer Services, 41(2018), 142–152. https://doi.org/10.1016/j.jretconser.2017.11.012
  • Ngai, E.W.T., & Wu, Y. (2022). Machine learning in marketing: A literature review, conceptual framework, and research agenda. Journal of Business Research, 145(March), 35–48. https://doi.org/10.1016/j.jbusres.2022.02.049
  • Ngai, E.W.T., Xiu, L., & Chau, D.C.K. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592–2602. https://doi.org/10.1016/j.eswa.2008.02.021
  • Obitade, O.P. (2021). The mediating role of knowledge management and information systems selection management capability on Big Data Analytics quality and firm performance. Journal of Decision Systems, 1–41. https://doi.org/10.1080/12460125.2021.1966162
  • Padmanabhan, B., Fang, X., Sahoo, N., & Burton-Jones, A. (2022). Machine learning in information systems research. MIS Quarterly, 46(1), iii–xix. https://aisnet.org/news/566283/AIS-Mourns-the-Loss-of-Past-President-T.P.-Liang.htm
  • Pappas, I.O., Mikalef, P., Giannakos, M.N., Krogstie, J., & Lekakos, G. (2018). Big data and business analytics ecosystems: Paving the way towards digital transformation and sustainable societies. Information Systems and E-Business Management, 16(3), 479–491. https://doi.org/10.1007/s10257-018-0377-z
  • Park, C.H., Park, Y.-H., & Schweidel, D.A. (2014). A multi-category customer base analysis. International Journal of Research in Marketing, 31(3), 266–279. https://doi.org/10.1016/j.ijresmar.2013.12.003
  • Pickton, D., & Broderick, A. (2005). Integrated Marketing Communications. Pearson Education Limited.
  • Pick, J.B., Turetken, O., Deokar, A.V., & Sarkar, A. (2017). Location analytics and decision support: Reflections on recent advancements, a research framework, and the path ahead. Decision Support Systems, 99, 1–8. https://doi.org/10.1016/j.dss.2017.05.016
  • Raudys, Š., & Pikelis, V. (1980). On dimensionality, sample size, classification error, and complexity of classification algorithm in. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2(3), 242–252. https://doi.org/10.1109/TPAMI.1980.4767011
  • Saggi, M.K., & Jain, S. (2018). A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54(5), 758–790. https://doi.org/10.1016/j.ipm.2018.01.010
  • Shea, V.J., Dow, K.E., Chong, A.Y.L., & Ngai, E.W.T. (2019). An examination of the long-term business value of investments in information technology. Information Systems Frontiers, 21(1), 213–227. https://doi.org/10.1007/s10796-017-9735-5
  • Shearer, ca. 2000. The CRIS-DM model: The New Blueprint for Data Mining. Journal of Data Warehousing, 5(4)13–22, 14, www.spss.com%5Cnwww.dw-institute.com
  • Subramanian, U., Raju, J.S., & Zhang, Z.J. (2014, December). The Strategic Value of High-Cost Customers The Strategic Value of High-Cost Customers. Management Science.
  • Taramigkou, M., Apostolou, D., & Mentzas, G. (2018). Leveraging exploratory search with personality traits and interactional context. Information Processing & Management, 54(4), 609–629. https://doi.org/10.1016/j.ipm.2018.04.001
  • Tiwari, S., Wee, H.M., & Daryanto, Y. (2018). Big data analytics in supply chain management between 2010 and 2016: Insights to industries. Computers & Industrial Engineering, 115(2018), 319–330. https://doi.org/10.1016/j.cie.2017.11.017
  • Trivedi, M. (2011). Regional and Categorical Patterns in Consumer Behavior: Revealing Trends. Journal of Retailing, 87(1), 18–30. https://doi.org/10.1016/j.jretai.2010.11.002
  • Tsai, C.Y., & Chiu, C.C. (2004). A purchase-based market segmentation methodology. Expert Systems with Applications, 27(2), 265–276. https://doi.org/10.1016/j.eswa.2004.02.005
  • Tynan, A.C., Drayton, J., & Tynan, A.C. (1987). Market segmentation. Journal of Marketing Management, 2(3), 301–335. https://doi.org/10.1080/0267257X.1987.9964020
  • Ulkhaq, M.M., & Adyatama, A. (2021). Clustering countries according to the world happiness report 2019. Engineering and Applied Science Research, 48(2), 137–150. https://doi.org/10.14456/easr.2021.16
  • Vakulenko, Y., Arsenovic, J., Hellström, D., & Shams, P. (2022). Does delivery service differentiation matter? Comparing rural to urban e-consumer satisfaction and retention. Journal of Business Research, 142(January), 476–484. https://doi.org/10.1016/j.jbusres.2021.12.079
  • van den Broek, E., Sergeeva, A., & Huysman, M. (2021). When the machine meets the expert: An ethnography of developing ai for hiring. MIS Quarterly: Management Information Systems, 45(3), 1557–1580. https://doi.org/10.25300/MISQ/2021/16559
  • Vecchio, P.D., Mele, G., Ndou, V., & Secundo, G. (2018). Creating value from Social Big Data: Implications for Smart Tourism Destinations. Information Processing & Management, 54(5), 847–860. https://doi.org/10.1016/j.ipm.2017.10.006
  • Verma, S., Bhattacharyya, S.S., & Kumar, S. (2018). An extension of the technology acceptance model in the big data analytics system implementation environment. Information Processing & Management, 54(5), 791–806. https://doi.org/10.1016/j.ipm.2018.01.004
  • Wallach, H.M., Mimno, D.M., & McCallum, A. (2009). Rethinking LDA: Why priors matter. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culotta Eds. Advances in neural information processing systems 22 (NIPS 2009), 1973–1981.
  • Wang, S.C., Tsai, Y.T., & Ciou, Y.S. (2020). A hybrid big data analytical approach for analyzing customer patterns through an integrated supply chain network. Journal of Industrial Information Integration, 20(October), 100177. https://doi.org/10.1016/j.jii.2020.100177
  • Westland, J.C., Mou, J., & Yin, D. (2019). Demand cycles and market segmentation in bicycle sharing. Information Processing & Management, 56(4), 1592–1604. https://doi.org/10.1016/j.ipm.2018.09.006
  • Widaningrum, D.L., Surjandari, I., & Arymurthy, A.M. (2017). Spatial data utilization for location pattern analysis. Procedia Computer Science, 124(2017), 69–76. https://doi.org/10.1016/j.procs.2017.12.131
  • Wu, X., Kumar, V., Ross, Q.J., Ghosh, J., Yang, Q., Motoda, H., … Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14. https://doi.org/10.1007/s10115-007-0114-2
  • Wu, I.C., & Yu, H.K. (2020). Sequential analysis and clustering to investigate users’ online shopping behaviors based on need-states. Information Processing & Management, 57(6), 102323. https://doi.org/10.1016/j.ipm.2020.102323
  • Zeleny, J., Burget, R., & Zendulka, J. (2017). Box clustering segmentation: A new method for vision-based web page preprocessing. Information Processing & Management, 53(3), 735–750. https://doi.org/10.1016/j.ipm.2017.02.002
  • Zerbino, P., Aloini, D., Dulmin, R., & Mininno, V. (2018). Big Data-enabled Customer Relationship Management: A holistic approach. Information Processing & Management, 54(5), 818–846. https://doi.org/10.1016/j.ipm.2017.10.005
  • Zhang, Z., Yoo, Y., Lyytinen, K., & Lindberg, A. (2021). The Unknowability of Autonomous Tools and the Liminal Experience of Their Use. Information Systems Research, 32(4), 1192–1213. https://doi.org/10.1287/irse.2021.1022
  • Zhou, J., Zhai, L., & Pantelous, A.A. (2020). Market segmentation using high-dimensional sparse consumers data. Expert Systems with Applications, 145(2020), 1–17. https://doi.org/10.1016/j.eswa.2019.113136