Traditional customer churn models take in data about the customers (such as contract details, demographics, and CRM data, among others) and output a prediction on the customers’ likelihood to churn. While this information is interesting, when we make predictions on users’ churn likelihood in a business setting, we are not primarily interested in the likelihood per se, but in learning how to effectively minimize the impact of churn on our business. The “how” is not always straight-forward. Intervening actions can be taken, but their outcome remains uncertain, as each customer responds differently.
A company’s customers can be divided in four subgroups, based on their propensity to churn without intervention and with intervention (often referred to as “treatment” in academia). The four subgroups are (adapted from a 2021 paper by Devriendt et al.):
The real challenge, and the holy grail in churn prediction, is separating the Persuadables from the Lost causes, in order identify where intervention resources should be targeted. A traditional (naive) churn prediction model should be able to flag both the Lost causes and the Persuadables as customers at high churn risk, but is unable to separate them further. Making a distinction between the two can beimportant, as ideally the intervention measures would be only offered for the true Persuadables.
Furthermore, offering intervention measures, be it discounts or anything alike at will to customers, will enable them to abuse the system. In the long run, the Sure things will learn to act as Persuadables and request the same benefits aimed at the Persuadables in encouraging them to stay on (such as discounts), resulting in loss of sure revenue for the company in question.
Uplift models are compared against traditional churn prediction models in detail in a recent paper by Devriendt et al. (2021)1. The authors define the difference between traditional (classic) churn prediction methods and an uplift model by noting:
“ […] classic methods only differentiate customers who are about to churn from non-churners, while uplift modeling differentiates customers whose targeting will benefit the company from other customers.”
In other words, an uplift model will classify potential churners into Persuadables and Lost causes.
To do this, an uplift model needs data. As much of it as possible. Among the data points, the model needs to know, whether a customer has been treated with an intervention (for example been offered a discount to continue the service) and the final outcome for the user (i.e. churn / no churn). Given this and a sufficient amount of observations, the model will learn to estimate the possibility of a given customer churning if an intervention / treatment is not provided AND what the corresponding possibility is if intervention / treatment is provided. The difference between the two defines the uplift. The higher the uplift, the more an intervention / treatment lowers the customer’s churn probability, and the more effective it is estimated to be. Under this definition, the Persuadables would have a high uplift, the Do-not-disturbs would have a negative uplift and the Sure things and Lost causes an uplift of zero.
Uplift = Probability of churn without intervention – Probability of churn with intervention
Uplift modelling provides the theoretically optimal framework for churn modelling. Practice however, often can be different. Two chasms between theory and practice arise:
Let's begin with the first one: amount of data. Churn analytics require a lot of data to begin with. To further break the data into smaller subsections to predict the impact of individual action on a customer's churn probability accurately increases this manyfold. This is further multiplied by the amount of potential treatments a company has at its disposal. Consider a highly simplified example, where 10% of the data points contain an elevated churn risk, and that the company has 10 potential treatments available. Let's further assume that 50% of the time a treatment is administered2. This means that 10% of the data will provide an answer to the question "did a high risk customer churn or not?", but only one in 200 (10% x 50% / 10 actions = 0.5%) will answer to the question "did taking a specific action decrease the customer's churn risk?". Gathering a comprehensive data set with sufficient amount of observations will therefore take significantly longer.
The second challenge arises from psychology. One of the main points of uplift modelling is to identify the Lost causes from the Persuadables. Consider a situation where the model has identified a customer as a Lost cause from a customer success manager's point of view. In an age where treatments can be administered digitally at a very low marginal cost, doing nothing is a triumph of mental strength and goes against intuition. Doing nothing would require absolute certainty, which comes back to my first point: while absolute certainty is never possible, getting close to certainty requires a lot (deep learning type of a lot) of data.
So while uplift modelling provides the theoretically correct framework for churn prevention analysis, "naive" churn prediction must not be overlooked as it might be more practical in many cases. I would recommend companies getting started with churn prediction to begin with "naive" models and progressively move towards uplift models as they accumulate experience and data.
Kirnu is initialized with traditional churn prediction. However, we provide our users a choice between "naive" churn prediction and uplift models. Making the jump from the former to the latter is as easy as toggling a switch.
If you want to shift to an uplift model, please keep in mind that you need to supplement the required treatment data. This can be easily done through our API, which lets you provide additional data to Kirnu.
If you are unsure which would suit you better, get in touch with us at email@example.com, and we will discuss your individual case in more detail.
1 Note that the name of their paper “Why you should stop predicting customer churn and start using uplift models” does not mean that customer churn should not be predicted at all, but that the traditional churn prediction models should be equipped with the capability to answer which churners should be targeted, not only estimate who is the next most likely churner
2 We also assume that actions (treatments) are administered to customers at an estimated high churn risk
Enter your email below to hear from us (maximum once a week) and to secure your place among early users!
P.S. we expect to be beta test ready during Spring 2021 and are looking for test users. Drop us a line at firstname.lastname@example.org if you are interested in reducing churn in your SaaS application and getting exclusive access to Kirnu.