Customer Data & Analytics Blog

The Mechanics of Predicting Customer Churn

Andrew Malinow, PhD and Mimoza Marko | 2 minute read


When The Business Follows A Subscription Model

Customer churn is a typical dynamic in any business – for one reason or another, a customer who has previously purchased from a company, no longer purchases. However, to surface potential causes for churn that can inform mitigation activities, we need a more operational definition.

Churn can be defined in several different ways. If a business uses a subscription model (e.g., Netflix, Amazon Prime membership), churn can be defined as those customers who have cancelled their subscription. A subscription cancellation typically exists as an explicit field in a database (cancelled_subscription=True), or it may need to be derived in some way. In either case, there is a specific event, that is either explicitly captured, or can easily be derived from existing data points, that provides the definition for churn.

However, if a business does not use a subscription model, the definition of churn must be derived based on a change in the customer’s transactional behavior (purchases) over a certain amount of time.

There are various methods that can be used to predict churn. When there is a subscription model being utilized, we have a specific indicator available for our analysis – we can ‘label’ those who have cancelled their subscription – the analysis is relatively straightforward. The first step is to take a sample of internal customer data and split into two groups – those who have churned, and those who have not. A number of Machine Learning models (e.g., such as Logistic Regression, Random Forest, or Naïve Baysian) can then be ‘trained’ to learn which ‘features’ are most predictive that someone is likely to churn. A feature represents a piece of information that we know about a customer. Examples of features include age, gender, geographic location, and marital status. A hypothetical analysis might indicate for example, that for Netflix, geographic location is highly predictive that someone will churn, and that zip codes along the Gulf coast are most highly correlated with churn. A possible explanation could be that weather-related issues negatively impacted streaming services in those areas, causing many people to cancel their subscriptions.


This is part 1 of 3 in the Mechanics of Predicting Customer Churn series. Stay tuned for an examination of how to predict churn based solely on customers’ purchasing behavior, coming next week.

Andrew Malinow, PhD, leads the Data Science team at Zylotech, where he leverages his background as a Cognitive Psychologist, statistical expertise and passion for surfacing actionable insights from large, messy data sets. At home he loves to spend time with his wife and 4 kids, doing anything outdoors, and tending to his ever-growing flock of chickens on his farm in Pomfret, CT.

Mimoza Marko is a Data Scientist at Zylotech, where she brings an extensive background in mathematics, statistics and computer science. She is passionate in exploring the mysteries in data.  While quality time with family and friends is her favorite part of the day, Mimoza loves painting, hiking, reading and learning about the universe.

If you liked this post, check out our other blog post on what marketers should know about 1st and 3rd party data.

Topics: Customer Analytics