Let's say we have a company that sells some products and you want to know the sales performance of the product. This article will show you how to group customers into segments based on their behaviour using the K-Means algorithm in Python. I hope this article helped you through each step of Customer Segmentation, from data preparation to data aggregation.
In this step, we will first collect data. For this case, we will be pulling data from the UCI Machine Learning Online Retail Dataset. The dataset itself is transaction data containing transactions from December 1, 2010 to December 9, 2011 for a UK-based online retailer.
After sampling the data, we will facilitate data analysis. To segment customers, we may use certain metrics, such as when the customer last purchased the product, how often the customer purchased the product, and the price the customer paid for the product. We will call this segment the RFM segment.
Right after preprocessing the data, we can now focus on modelling. To perform segmentation from data, we can use the K-Means algorithm to do this. The K-Means algorithm is an unsupervised learning algorithm that uses geometric principles to determine which clusters belong to the data. By defining each centroid, we calculate the distance to each centroid. Each datum belongs to a centroid if it has the smallest distance from each other. It repeats until the next total distance has not changed significantly from the previous one. Implementing K-Means in Python Technology is easy. We can use scikit-learn's KMeans function to do this.
|
Author : Tellius |
Views : 51 |
|
|
|
|
This Blog Has Been PowerShared™ Successfully! |
|
|
Check out Tellius's Profile, and Blogs! |
|