Customer Data & Analytics Blog

The Mechanics of Predicting Customer Churn: Part 2

Andrew Malinow, PhD and Mimoza Marko | 2 minute read


Last week’s blog post focused on how to predict customer churn for businesses that have a subscription model, where the definition of “churn” is straightforward – a subscription canceled equals a customer churned.

This post will provide an overview on how to predict customer churn for businesses that do not rely on subscriptions. The absence of an explicit churn ‘label’ in our data adds an additional level of computational complexity to the analysis – specifically there is need to develop a mechanism to define “churn” (rather than inherit it directly from an existing data element). To this end, we will leverage information about customers’ transactional behaviors to provide us with a definition for churn that we can use for building our model.

The first and most important step in building a model that will accurately predict propensity (likelihood) for a customer to churn is to assign each customer a label – indicating whether the customer has churned or not based on their historical transaction data. Since there is not a specific field in the data that indicates if a customer in the database is still a buyer or not, we need focus on the customer’s purchasing behavior. The most recent six or twelve months of transactions (depending on how much historical data you have access to) should be left out of the initial analysis and model development process and used to test, validate and refine the accuracy of the predictions produced by an initial model.

The remaining transaction data (purchases) will tell each customer’s “story” – specifically the frequency of purchases, and the time interval between purchases. Analyzing this behavior mathematically can be used for a definition for churn of that particular customer. Additionally, we can generate new features regarding customer’s buying attitude that might be helpful in predicting their point of churn in time.

An example of such a feature might be the buying frequency (e.g. customer buys once every 45 days). At the end of this process, the specific event which determines the label will be a statement regarding the frequency of purchases (e.g. "a customer has churned if there are no purchases during the last 45 days"). It is important to switch focus from company churn definition to individual customer level. Doing so will result in higher probability of assigning the right label.

Once we have divided the customers into churned and not churned, we can begin training Machine Learning Classification Models (like Logistic Regression, Random Forest, etc.) and follow the same process as for businesses that follow a subscription model. To evaluate the accuracy of the model, we use the transaction data that we set aside in the first step of our analysis, which allows us to see if the customers we predicted to churn (or not churn) have made any purchases. Finally, the selected model will give us the importance of each feature included in it as a coefficient score, which we can use to determine which piece of information about a customer – either given or derived – is more influential in predicting the churn.

The final post in this series will focus on how to ‘tune’ your churn definition after building a preliminary model.


This is part 2 of 3 in the Mechanics of Predicting Customer Churn series. Stay tuned for an explanation of how to tune your individual customer churn model once you have it up and running.

Andrew Malinow, PhD leads the Data Science team at Zylotech, where he leverages his background as a Cognitive Psychologist, statistical expertise and passion for surfacing actionable insights from large, messy data sets. At home he loves to spend time with his wife and 4 kids, doing anything outdoors, and tending to his ever-growing flock of chickens on his farm in Pomfret, CT.

Mimoza Marko is a Data Scientist at Zylotech, where she brings an extensive background in mathematics, statistics and computer science. She is passionate in exploring the mysteries in data.  While quality time with family and friends is her favorite part of the day, Mimoza loves painting, hiking, reading and learning about the universe.

If you liked this post, check out our other blog post on how marketers can simplify personalized marketing.

Topics: Customer Analytics