Exploring Customer Segmentation in E-Commerce using RFM Analysis with Clustering Techniques

Main Article Content

Chun-Gee Wong https://orcid.org/0009-0000-4368-2214
Gee-Kok Tong https://orcid.org/0000-0002-5086-9383
Su-Cheng Haw https://orcid.org/0000-0002-7190-0837

Keywords

customer segmentation, RFM analysis, clustering analysis, K-Means Clustering, Hierarchical Clustering

Abstract

The proliferation of big data and the growth of e-commerce have intensified the challenges associated with extracting actionable data for personalised recommendations and decision-making. With data-driven marketing strategies, understanding and predicting customer behaviour has become paramount for maintaining competitive advantage. This study leverages business analytics tools, focusing on Recency, Frequency, and Monetary (RFM) Analysis, alongside K-Means and Hierarchical (Agglomerative) Clustering algorithms, to segment customer transactional data. Data normalisation, a critical step for accurate clustering, was performed using log transformation and the Power Transformer technique with the Yeo-Johnson parameter, the latter proving more effective for handling both positively and negatively skewed data, enhancing data normalisation and suitability for analysis. This study reveals that RFM Analysis with Hierarchical Clustering outperforms K-Means Clustering, achieving a Silhouette Score of 0.47 and a Calinski–Harabasz Index of 3787.1, indicating a more accurate identification of customer segments. RFM Analysis alone generated eight clusters, while integrating RFM Analysis with both Hierarchical Clustering and K-Means generated three similar-sized clusters with interchanged labels. These metrics highlight the proficiency of Hierarchical Clustering in identifying unique customer segments and customising marketing strategies. The findings indicate that the RFM-Hierarchical Clustering approach enhances segmentation precision and facilitates more refined and effective marketing strategies.

Downloads

Download data is not yet available.
Abstract 89 | 978-PDF-v12n3pp97-125 Downloads 5

References

Abdulhafedh, A. (2021). Incorporating K-means, hierarchical clustering and PCA in customer segmentation. Journal of City and Development, 3(1), 12–30. ResearchGate. https://www.researchgate.net/publication/349094412_Incorporating_K-means_Hierarchical_Clustering_and_PCA_in_Customer_Segmentation
Anitha, P., & Patil, M. M. (2022). RFM model for customer purchase behavior using K-Means algorithm. Journal of King Saud University – Computer and Information Sciences, 34(5), 1785–1792. https://doi.org/10.1016/j.jksuci.2019.12.011
Caliñski, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3(1), 1–27. https://doi.org/10.1080/03610927408827101
Chaubey, G., Gavhane, P. R., Bisen, D., & Arjaria, S. K. (2022). Customer purchasing behavior prediction using machine learning classification techniques. Journal of Ambient Intelligence and Humanized Computing. https://doi.org/10.1007/s12652-022-03837-6
Chen, D. (2015). Online retail. UC Irvine Machine Learning Repository. https://doi.org/10.24432/C5BW33
Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing and Customer Strategy Management, 19(3), 197–208. https://doi.org/10.1057/dbm.2012.17
Christy, A. J., Umamakeswari, A., Priyatharsini, L., & Neyaa, A. (2021). RFM ranking – An effective approach to customer segmentation. Journal of King Saud University – Computer and Information Sciences, 33(10), 1251–1257. https://doi.org/10.1016/j.jksuci.2018.09.004
Everitt, B., & Hothorn, T. (2011). An Introduction to Applied Multivariate Analysis with R. Springer. https://www.springer.com/series/6991
Fahrudin, N. F., & Rindiyani, R. (2024). Comparison of k-medoids and k-means algorithms in segmenting customers based on RFM criteria. E3S Web of Conferences, 484. https://doi.org/10.1051/e3sconf/202448402008
Garg, A., Popli, R., & Sarao, B. S. (2021). Growth of digitization and its impact on big data analytics. IOP Conference Series: Materials Science and Engineering, 1022(1). https://doi.org/10.1088/1757-899X/1022/1/012083
Griva, A., Bardaki, C., Pramatari, K., & Papakiriakopoulos, D. (2018). Retail business analytics: Customer visit segmentation using market basket data. Expert Systems with Applications, 100, 1–16. https://doi.org/10.1016/j.eswa.2018.01.029
Hughes, A. M. (1994). Strategic Database Marketing. Probus Publishing.
Idowu, S., & Kattukottai, S. (2019). Customer segmentation based on RFM model using k-means, hierarchical and fuzzy c-means clustering algorithms. ResearchGate. https://doi.org/10.13140/RG.2.2.15379.71201
Kabaskal, İ. (2020). Customer segmentation based on recency frequency monetary model: A case study in e-retailing. Bilişim Teknolojileri Dergisi, 13(1), 47–56. https://doi.org/10.17671/gazibtd.570866
Kadir, M., & Achyar, A. (2019) Customer segmentation on online retail using RFM analysis: Big data case of Bukku.id. European Union Research Library. https://doi.org/10.4108/eai.1-4-2019.2287279
Kassambara, A. (2017). Practical guide to cluster analysis in R: Unsupervised machine learning. https://xsliulab.github.io/Workshop/2021/week10/r-cluster-book.pdf
Kaur, M., & Aggarwal, K. (2022). Big data market trends and its impact on e-commerce industry. ResearchGate. https://www.researchgate.net/publication/359024503_BIG_DATA_MARKET_TRENDS_AND_ITS_IMPACT_ON_E-COMMERCE_INDUSTRY
Kurniawan, F., Umayah, B., Hammad, J., Mardi, S., Nugroho, S., & Hariadi, M. (2018). Market basket analysis to identify customer behaviors by way of transaction data. Knowledge Engineering, 1(1), 20–25. https://doi.org/10.17977/um018v1i12018p20-25
Li, X. (2021). Business analytics in E-commerce: a literature review. Journal of Industrial Integration and Management, 6(1), 31–52. https://doi.org/10.1142/S2424862220500207
Mallam, P., Ashu, & Singh, B. (2021). Business intelligence techniques using data analytics: an overview. Proceedings – 2021 International Conference on Computing Sciences, ICCS 2021, 265–267. https://doi.org/10.1109/ICCS54944.2021.00059
Maraghi, M., Amin Adibi, M., & Mehdizadeh, E. (2020). Using RFM model and market basket analysis for segmenting customers and assigning marketing strategies to resulted segments. Journal of Applied Intelligent Systems & Information Sciences. https://doi.org/10.22034/jaisis.2020.102488
Mehmeti, G., & Luga, E. (2021). The influence of situational factors on consumer purchasing behavior – the case of Covid-19. Albanian Journal of Agricultural Sciences, 20(2).
Nair, A. (9 November 2023). RFM analysis for successful customer segmentation. Putler. https://www.putler.com/rfm-analysis/
Raguseo, E., & Vitari, C. (2018). Investments in big data analytics and firm performance: an empirical investigation of direct and mediating effects. International Journal of Production Research, 56(15), 5206–5221. https://doi.org/10.1080/00207543.2018.1427900
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. In Journal of Computational and Applied Mathematics, 20. https://doi.org/10.1016/0377-0427(87)90125-7.
Sarkar, M., Aisharyja, Puja, R., & Chowdhury, F. R. (2024). Optimizing marketing strategies with RFM method and k-means clustering-based AI customer segmentation analysis. Journal of Business and Management Studies. https://doi.org/10.32996/jbms.2024.6.2.5
Serwah, A., Khaw, K. W., Cheang, S. P. Y., & Alhamzah, A. (2023). Customer analytics for online retailers using weighted k-means and RFM analysis. Data Analytics and Applied Mathematics, 1–7. https://doi.org/10.15282/daam.v4i1.9171
Sutresno, S. A., Iriani, A., & Sediyono, E. (2018). Metode K-Means Clustering dengan Atribut RFM untuk Mempertahankan Pelanggan. JuTISI, 4(3), 433–440. Retrieved from https://journal.maranatha.edu/index.php/jutisi/article/view/1479
Xie, H., Zhang, L., Lim, C. P., Yu, Y., Liu, C., Liu, H., & Walters, J. (2019). Improving K-means clustering with enhanced Firefly algorithms. Applied Soft Computing Journal, 84. https://doi.org/10.1016/j.asoc.2019.105763
Yeo, I.-K. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87(4), 954–959. https://doi.org/10.1093/biomet/87.4.954