Churn prediction is difficult. Before you can do anything to prevent customers leaving, you need to know everything from who’s going to leave and when, to how much it will impact your bottom line. In this post I’m going to explain some techniques for churn prediction and prevention using survival analysis.

Will they, won’t they

The way many data analysts try to model this problem is by thinking in black-and-white terms: churn vs no-churn. It’s really easy to view the problem in this way as it’s a pattern we all know - supervised classification.

But doing so leaves out a lot of the nuance of the churn prediction problem - the risk, the timelines, the cost of a customer leaving.

Nevertheless, let’s start with a classification model and see where we end up.

Our dataset

The dataset we’ll be using is the Kaggle Telco Churn dataset (available here), it contains a little over 7,000 customer records and includes features such as the customer’s monthly spend with the company, the length of time (in months) that they’ve been customers, and whether or not they have various internet service add-ons.

Here’s the first 5 rows:

customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity OnlineBackup DeviceProtection TechSupport StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod MonthlyCharges TotalCharges Churn
7590-VHVEG Female 0 Yes No 1 No No phone service DSL No Yes No No No No Month-to-month Yes Electronic check 29.85 29.85 No
5575-GNVDE Male 0 No No 34 Yes No DSL Yes No Yes No No No One year No Mailed check 56.95 1889.5 No
3668-QPYBK Male 0 No No 2 Yes No DSL Yes Yes No No No No Month-to-month Yes Mailed check 53.85 108.15 Yes
7795-CFOCW Male 0 No No 45 No No phone service DSL Yes No Yes Yes No No One year No Bank transfer (automatic) 42.30 1840.75 No
9237-HQITU Female 0 No No 2 Yes No Fiber optic No No No No No No Month-to-month Yes Electronic check 70.70 151.65 Yes

The first thing you’ll notice is that there are a lot of categorical variables rendered as text values (‘Yes’,’No’,etc.) in the columns, let’s use pd.get_dummies to fix those:

dummies = pd.get_dummies(data[[
 'gender',
 'SeniorCitizen',
 'Partner',
 'Dependents',
 'tenure',
 'PhoneService',
 'MultipleLines',
 'InternetService',
 'OnlineSecurity',
 'OnlineBackup',
 'DeviceProtection',
 'TechSupport',
 'StreamingTV',
 'StreamingMovies',
 'Contract',
 'PaperlessBilling',
 'PaymentMethod',
 'Churn'
]])
data = dummies.join(data[['MonthlyCharges', 'TotalCharges']])

If you’re following along, you’ll also have to fix some missing values in the TotalCharges column which has made the whole column text instead of numeric:

data['TotalCharges'] = data[['TotalCharges']].replace([' '], '0')
data['TotalCharges'] = pd.to_numeric(data['TotalCharges'])

Now we have our data in a useable format, let’s visualise it a little:

from matplotlib import pyplot as plt
plt.scatter(
    data['tenure'], 
    data['MonthlyCharges'], c=data['Churn_Yes'])
plt.xlabel('Customer Tenure (Months)')
plt.ylabel('Monthly Charges')

It’s hard to draw any conclusions from this visualisation which compares the customer’s tenure (the thing we’re trying to improve) with their monthly charges. Let’s move on to training a Logistic Regression model to see if we can use these dummy features to predict churn.

Logistic Regression

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(data[x_select], data['Churn_Yes'])

clf = LogisticRegression(solver='lbfgs', max_iter=1000)
clf.fit(X_train, y_train)

If we train a model without doing any rebalancing of the classes or weighting of the samples, we can achieve an accuracy of 79.9%. That’s not bad for a first attempt.

Here’s the confusion matrix for the Logistic Regression model:

What you can see here is that while the overall accuracy is fairly high, the model only identifies the customers who will leave 55% of the time, much closer to random than we’d like.

What’s our goal?

This is the point in the process where most data scientists (myself included) would double down on trying to improve the accuracy of the model by engineering new features, trying different algorithms, balancing the dataset and so on. But it’s good to stop and ask yourself what your end goal is.

It would be great to know every single customer who will churn, but how much insight would that information really bring? And how would you know what to focus on if you wanted to keep them and how much you could spend to keep them before having them as a customer turned into a loss?

Let’s try taking a different perspective.

Survival Analysis

What Logistic Regression is doing (under the hood) is assigning a probability to each observation that describes how likely it is to belong to the positive class.

In the case of churn vs no-churn, and any kind of classification, this can seem like a minor annoyance that we have to overcome by selecting a cutoff and rounding the output (<0.5 means 0, >=0.5 means 1). However, if you think about it, this probability is exactly what we want.

In any large enough group of customers there are going to be people who have the same attributes / features (that’s the pigeon hole principle at work). Some of those will churn and some of them won’t, and what you’d ideally like to know is the probability of churn for each group. This is what Logistic Regression gives you (before you cutoff and round), but using Logistic Regression in this scenario does have one problem - it’s not clear what time scale it’s predicting for. Enter Survival Analysis.

Survival Analysis is a set of methods used in the life sciences (mostly Epidemiology and Pharma research) to determine the probability of patient cohort survival over time. It’s a very large body of work with a great many intricate and statistically sophisticated tools, but we’ll only be using one of them - The Cox Proportional Hazards Model.

The Cox Proportional Hazards Model

The Cox PH Model is a regression based model that analyses the covariates (features) of a dataset with regards to how long a patient (or customer) survives. It is called the Proportional Hazards model as it describes how each feature proportionally increases risk over the baseline survival rate for a cohort.

Thanks to the very good and well-documented lifelines package in Python, it is simple to get started using the Cox PH model.

Applying the CoxPH Model

In most cases, the first thing you’d have to do to your dataset to get it ready for the Cox regression model is create two new features:

  1. The ‘age’ of an observation (the difference in time between a patient starting a course of medication and the most recent observation of their status, or in our case the time between a customer joining a service and the most recent observation of whether or not they have churned)
  2. The ‘event’ flag (a binary flag representing if an event had occurred, such as death or churn)

Luckily for us, the telco dataset has both of these features already engineered in the tenure and Churn columns.

An important thing to note when it comes to doing any kind of matrix-based regression is that Singular Matrices will throw an error in Python (as they should). All this means is that when you create dummy variables, you must throw one of the columns away. (Don’t worry, we’ll still be able to infer the missing category from the remaining variables.)

Here’s the first 5 rows of our dummied and reduced dataset:

SeniorCitizen tenure gender_Female Partner_Yes Dependents_Yes PhoneService_Yes MultipleLines_Yes InternetService_DSL InternetService_Fiber optic OnlineSecurity_Yes OnlineBackup_Yes DeviceProtection_Yes TechSupport_Yes StreamingTV_Yes StreamingMovies_Yes Contract_One year Contract_Two year PaperlessBilling_Yes PaymentMethod_Bank transfer (automatic) PaymentMethod_Credit card (automatic) PaymentMethod_Electronic check MonthlyCharges TotalCharges Churn_Yes
0 1 1 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 29.85 29.85 0
0 34 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 56.95 1889.50 0
0 2 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 53.85 108.15 1
0 45 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 0 1 0 0 42.30 1840.75 0
0 2 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 70.70 151.65 1

You can see the Gender_Male has disappeared, as has Partner_No, Dependents_No and so on.

Now that we have our dataset in the right format, let’s fit the Cox model:

from lifelines import CoxPHFitter

cph_train, cph_test = train_test_split(data[x_select], test_size=0.2)
cph.fit(cph_train, 'tenure', 'Churn_Yes')

There are a few unique things about the lifelines package which can trip up frequent users of Scikit-Learn, the first is that the column that contains the event observations (churn), needs to be in the single dataset that’s passed to the fit call. As such, we can’t split our dataset into four pieces (train and test for X and y) as we would in Logistic Regression, we’ll have to split it in two. For anyone who’s curious, this is very similar to the R syntax, where you specify the relevant columns within a single dataset and the algorithm takes care of removing them from the training data.

In the cph.fit call, you have to pass in three different arguments. The first is the dataset that we’ve created using train_test_split, the second is the ‘age’ column (in our case tenure) and the third is the ‘event’ column (Churn_Yes in our case).

The next unique thing about the lifelines package is the .print_summary method that can be used on models (another thing borrowed from R).

Here’s our model summary:

cph.print_summary()

>>>
n=5634, number of events=1483

                                           coef  exp(coef)  se(coef)        z      p  lower 0.95  upper 0.95     
SeniorCitizen                            0.0714     1.0740    0.0634   1.1253 0.2605     -0.0529      0.1957     
gender_Female                            0.0718     1.0745    0.0524   1.3709 0.1704     -0.0309      0.1745     
Partner_Yes                             -0.1787     0.8364    0.0618  -2.8895 0.0039     -0.2998     -0.0575   **
Dependents_Yes                          -0.1287     0.8792    0.0791  -1.6267 0.1038     -0.2837      0.0264     
PhoneService_Yes                         0.9206     2.5109    0.5338   1.7246 0.0846     -0.1257      1.9669    .
MultipleLines_Yes                        0.0901     1.0943    0.1407   0.6404 0.5219     -0.1857      0.3659     
InternetService_DSL                      2.6314    13.8933    0.6706   3.9242 0.0001      1.3172      3.9457  ***
InternetService_Fiber optic              3.7740    43.5561    1.3037   2.8948 0.0038      1.2188      6.3293   **
OnlineSecurity_Yes                      -0.2140     0.8073    0.1489  -1.4374 0.1506     -0.5059      0.0778     
OnlineBackup_Yes                        -0.0292     0.9712    0.1446  -0.2021 0.8399     -0.3126      0.2542     
DeviceProtection_Yes                     0.1206     1.1282    0.1422   0.8481 0.3964     -0.1581      0.3993     
TechSupport_Yes                         -0.0442     0.9567    0.1487  -0.2974 0.7662     -0.3356      0.2472     
StreamingTV_Yes                          0.3632     1.4380    0.2654   1.3686 0.1711     -0.1569      0.8834     
StreamingMovies_Yes                      0.3609     1.4346    0.2658   1.3579 0.1745     -0.1600      0.8819     
Contract_One year                       -1.2910     0.2750    0.1160 -11.1329 0.0000     -1.5183     -1.0637  ***
Contract_Two year                       -3.5343     0.0292    0.2253 -15.6903 0.0000     -3.9758     -3.0928  ***
PaperlessBilling_Yes                     0.1797     1.1968    0.0640   2.8076 0.0050      0.0542      0.3051   **
PaymentMethod_Bank transfer (automatic) -0.5287     0.5894    0.0974  -5.4302 0.0000     -0.7195     -0.3379  ***
PaymentMethod_Credit card (automatic)   -0.5593     0.5716    0.1001  -5.5898 0.0000     -0.7554     -0.3632  ***
PaymentMethod_Electronic check          -0.1584     0.8535    0.0777  -2.0378 0.0416     -0.3107     -0.0060    *
MonthlyCharges                           0.0069     1.0070    0.0258   0.2685 0.7883     -0.0437      0.0576     
TotalCharges                            -0.0017     0.9983    0.0000 -35.1085 0.0000     -0.0017     -0.0016  ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Concordance = 0.929
Likelihood ratio test = 4801.049 on 22 df, p=0.00000

There are a few important things to notice about this output.

  1. We can see the number of observations listed as n=5634 right at the top of the output, next to that we have our number of events (churned customers).
  2. We get the coefficients of our model. These are very important and they tell us how each feature increases risk, so if it’s a positive number that attribute makes a customer more likely to churn, and if it is negative then customers with that feature are less likely to churn.
  3. We get significance codes for our features. A very nice addition!
  4. We get the concordance.

Concordance

Similar to how you compare Logistic Regression models using accuracy, you can compare between different Cox PH models using the concordance.

Simply put, concordance is an assessment of how internally consistent the model is - if it says a particular feature increases risk, the observations that have that feature should be more at risk. If they are the concordance goes up, if they’re not then it decreases.

Our model has a concordance of .929 out of 1, so it’s a very good Cox model.

Plotting the Cox model

Calling the basic .plot function on the model gives us this:

A handy visualisation of the significance and risk effects of the various features.

Something else we can do at this point is investigate how the features impact survival over time, like this:

cph.plot_covariate_groups('TotalCharges', groups=[0,4000])

.plot_covariate_groups is a method from the lifelines package which takes a feature name as its first input and a range of groupings for its second. So here we’ll be looking at the different survival curves for customers whose TotalCharges are near zero compared to those whose TotalCharges are closer to 4000. Which looks like this:

You can see that customer whose TotalSpend is closer to zero are at a much higher risk for churning (their survival curve drops off) than those whose TotalSpend is closer to 4000.

Churn Prediction

We have a good, working model, now what?

The point of this exercise was to get some usable information to help us make decisions about how to intervene to reduce and prevent churn. So let’s make some predictions.

Unfortunately, there’s not much we can do for the customers who have already left, so let’s select only the remaining customers from our dataset:

censored_subjects = data.loc[data['Churn_Yes'] == 0]

In Survival Analysis lingo a censored observation is one which is yet to have an ‘event’, so we’re selecting all those customer who have yet to churn.

Now to predict their survival curves we use the handy method .predict_survival_function like so:

unconditioned_sf = cph.predict_survival_function(censored_subjects)

You can see that we’ve called these ‘unconditioned’ survival functions, that’s because we know that some of these curves will predict churn before the customer’s current tenure time. We have to condition the predictions on the basis that we know the customers were still with us when the data was collected:

conditioned_sf = unconditioned_sf.apply(lambda c: (c / c.loc[data.loc[c.name, 'tenure']]).clip_upper(1))

Now we can investigate individual customers and see how the conditioning has affected their survival over the baseline rate:

subject = 12
unconditioned_sf[subject].plot(ls="--", color="#A60628", label="unconditioned")
conditioned_sf[subject].plot(color="#A60628", label="conditioned on $T>58$")
plt.legend()

As you can see, the fact that we know customer 12 is still a customer after 58 months means his survival curve drops slower than the baseline one for customers similar to him without that condition.

That very handy predict_survival_function method has created a matrix containing a survival probability for each remaining customer at each point in time. What we need to do next is use that to select a single number as a prediction for how long the customer will last, which we can use to determine intervention value.

Depending on the use case you could choose any percentile, but for our case we’ll use the median.

from lifelines.utils import median_survival_times, qth_survival_times

predictions_50 = median_survival_times(conditioned_sf)
# This is the same, but you can change the fraction to get other %tiles. 
# predictions_50 = qth_survival_times(.50, conditioned_sf)

What that gives us is a single row (in a pandas DataFrame) that has the month number (tenure) where the customer has a 50% likelihood of churning.

We can use this single row and by joining it to our data DataFrame can investigate the predicted remaining value a customer has for the business:

values = predictions_50.T.join(data[['MonthlyCharges','tenure']])
values['RemainingValue'] = values['MonthlyCharges'] * (values[0.5] - values['tenure'])

Here’s the first 5 rows of this new DataFrame:

0.5 MonthlyCharges tenure RemainingIncome RemainingValue
70.0 29.80 69 29.8 29.8
69.0 19.45 67 38.9 38.9
72.0 19.70 70 39.4 39.4
69.0 39.90 68 39.9 39.9
71.0 20.35 69 40.7 40.7

The column named 0.5 is the single row we received from our median_survival_times call. If you selected a different percentile, this column would be named differently.

What we see here is that by multiplying the MonthlyCharges by the difference between the customer’s current tenure and their median cancellation date, we can get a sense for which customers would inflict the most damage to our top line.

Churn Prevention

Okay, so now we know which customers are the highest risk for churn, but what can we do to keep them?

If we go back to our coefficient chart from earlier we can see that the most significant features which impact survival positively are:

  1. Having a 2 year contract
  2. Having a 1 year contract
  3. Paying by Credit Card
  4. Paying by Bank Transfer

Beyond these four the increases in survival become minimal and the results aren’t significant anyway. So let’s focus on these four.

What we need to do to understand how much we can spend to keep customers is compare their survival rates with customer similar to them who instead have each of these four features present:

upgrades = ['PaymentMethod_Credit card (automatic)',
     'PaymentMethod_Bank transfer (automatic)',
     'Contract_One year',
     'Contract_Two year']

results_dict = {}
for customer in values.index:
    actual = data.loc[[customer]]
    change = data.loc[[customer]]
    results_dict[customer] = [cph.predict_median(actual)]
    for upgrade in upgrades: 
        change[upgrade] = 1 if list(change[upgrade]) == [0] else 0
        results_dict[customer].append(cph.predict_median(change))
        change[upgrade] = 1 if list(change[upgrade]) == [0] else 0

results_df = pd.DataFrame(results_dict).T
results_df.columns = ['baseline'] + upgrades
actions = values.join(results_df).drop([0.5], axis=1)

What we’re doing here is looping through the customers, changing one feature at a time and storing the predicted median survival for a customer who has that change. What we’re left with at the end is this:

MonthlyCharges tenure RemainingValue baseline PaymentMethod_Credit card (automatic) PaymentMethod_Bank transfer (automatic) Contract_One year Contract_Two year
29.85 1 597.00 21.0 25.0 25.0 32.0 50.0
56.95 34 854.25 47.0 52.0 52.0 37.0 71.0
42.30 45 634.50 59.0 64.0 54.0 49.0 72.0
89.10 22 801.90 26.0 22.0 31.0 37.0 54.0
29.75 10 476.00 25.0 30.0 30.0 35.0 53.0

We can see from this that if we managed to get the first customer to use a Credit Card to make payments that we could increase the survival time by 4 months (25 - 21 baseline) and so on.

This is a great result which really helps us to see how we can move the needle on keeping customers, but let’s go a step further and see what impact that has financially:

actions['CreditCard Diff'] = (
    actions['PaymentMethod_Credit card (automatic)'] - actions['baseline']) * actions['MonthlyCharges']

actions['BankTransfer Diff'] = (
    actions['PaymentMethod_Bank transfer (automatic)'] - actions['baseline']) * actions['MonthlyCharges']

actions['1yrContract Diff'] = (
    actions['Contract_One year'] - actions['baseline']) * actions['MonthlyCharges']

actions['2yrContract Diff'] = (
    actions['Contract_Two year'] - actions['baseline']) * actions['MonthlyCharges']
MonthlyCharges tenure RemainingValue baseline PaymentMethod_Credit card (automatic) PaymentMethod_Bank transfer (automatic) Contract_One year Contract_Two year CreditCard Diff BankTransfer Diff 1yrContract Diff 2yrContract Diff
29.85 1 597.00 21.0 25.0 25.0 32.0 50.0 119.40 119.40 328.35 865.65
56.95 34 854.25 47.0 52.0 52.0 37.0 71.0 284.75 284.75 -569.50 1366.80
42.30 45 634.50 59.0 64.0 54.0 49.0 72.0 211.50 -211.50 -423.00 549.90
89.10 22 801.90 26.0 22.0 31.0 37.0 54.0 -356.40 445.50 980.10 2494.80
29.75 10 476.00 25.0 30.0 30.0 35.0 53.0 148.75 148.75 297.50 833.00

Now we can see that moving the customer in the first row to paying by Credit Card could be worth up to £119.40. That’s much more helpful than a simple count of months.

Accuracy and Calibration

Okay, we’re nearly done. We have monetary values that we can use to judge whether a particular churn intervention is feasible as well as solid predictions regarding when a customer will churn. But how accurate is all of this?

We know our Cox model is a good one (92.9% concordance), but what does that mean in real terms? How accurate is it?

When you take a probabilistic view of events like churn (or fraud or theft) it’s important to check for calibration more so than accuracy. Calibration is the propensity of a model to get probabilities right over time.

Think of it like this, a weather forecasting service is well calibrated if out of all the times it said there was a 40% chance of rain, it actually rained 40% of the time.

In Scikit-Learn, we can use the calibration_curve method to derive values from probabilistic predictions and the true values (binary) of our dataset:

from sklearn.calibration import calibration_curve

plt.figure(figsize=(10, 10))
ax1 = plt.subplot2grid((3, 1), (0, 0), rowspan=2)

ax1.plot([0, 1], [0, 1], "k:", label="Perfectly calibrated")

probs = 1 - np.array(cph.predict_survival_function(cph_test).loc[13])
actual = cph_test['Churn_Yes']

fraction_of_positives, mean_predicted_value = \
calibration_curve(actual, probs, n_bins=10, normalize=False)

ax1.plot(mean_predicted_value, fraction_of_positives, "s-",
        label="%s" % ("CoxPH",))


ax1.set_ylabel("Fraction of positives")
ax1.set_ylim([-0.05, 1.05])
ax1.legend(loc="lower right")
ax1.set_title('Calibration plots  (reliability curve)')

Which gives us this:

You read this chart by examining the various probabilities against the fractions that are present in the dataset (in our case the test set).

You can see that it’s pretty close the diagonal line which represents perfect calibration. However, our model seems to underpredict risk at the low end (< 50% chance of churn) and slightly overpredict risk at the high end (> 50% chance of churn)

To get a numeric understanding of how far away the line is from perfect calibration, we can use the brier_score_loss from the Scikit-Learn package:

brier_score_loss(
    cph_test['Churn_Yes'], 1 - np.array(cph.predict_survival_function(cph_test).loc[13]), pos_label=1
    )

Those of you with a keen eye may have noticed that I keep indexing at tenure = 13. Because our model works over a range of time periods, we have to inspect the calibration at each step in order to get a feel for accuracy. Let’s do that all in one go:

loss_dict = {}
for i in range(1,73):
    score = brier_score_loss(
        cph_test['Churn_Yes'], 1 - np.array(cph.predict_survival_function(cph_test).loc[i]), pos_label=1
    )
    loss_dict[i] = [score]

loss_df = pd.DataFrame(loss_dict).T

fig, ax = plt.subplots()
ax.plot(loss_df.index, loss_df)

ax.set(xlabel='Prediction Time', ylabel='Calibration Loss',
       title='Cox PH Model Calibration Loss / Time')
ax.grid()
plt.show()

Which gives us this:

So we can see that our model is pretty well calibrated between 5 and 25 months and then proceeds to get less and less so as we start to predict further out.

The only remaining thing to do to make our analysis more realistic is account for this poor calibration.

Let’s create upper and lower bounds for the expected return on investment from getting customers to make changes:

loss_df.columns = ['loss']
temp_df = actions.reset_index().set_index('PaymentMethod_Credit card (automatic)').join(loss_df)
temp_df = temp_df.set_index('index')
actions['CreditCard Lower'] = temp_df['CreditCard Diff'] - (temp_df['loss'] * temp_df['CreditCard Diff'])
actions['CreditCard Upper'] = temp_df['CreditCard Diff'] + (temp_df['loss'] * temp_df['CreditCard Diff'])
temp_df = actions.reset_index().set_index('PaymentMethod_Bank transfer (automatic)').join(loss_df)
temp_df = temp_df.set_index('index')
actions['BankTransfer Lower'] = temp_df['BankTransfer Diff'] - (.5 * temp_df['loss'] * temp_df['BankTransfer Diff'])
actions['BankTransfer Upper'] = temp_df['BankTransfer Diff'] + (.5 * temp_df['loss'] * temp_df['BankTransfer Diff'])
temp_df = actions.reset_index().set_index('Contract_One year').join(loss_df)
temp_df = temp_df.set_index('index')
actions['1yrContract Lower'] = temp_df['1yrContract Diff'] - (.5 * temp_df['loss'] * temp_df['1yrContract Diff'])
actions['1yrContract Upper'] = temp_df['1yrContract Diff'] + (.5 * temp_df['loss'] * temp_df['1yrContract Diff'])
temp_df = actions.reset_index().set_index('Contract_Two year').join(loss_df)
temp_df = temp_df.set_index('index')
actions['2yrContract Lower'] = temp_df['2yrContract Diff'] - (.5 * temp_df['loss'] * temp_df['2yrContract Diff'])
actions['2yrContract Upper'] = temp_df['2yrContract Diff'] + (.5 * temp_df['loss'] * temp_df['2yrContract Diff'])

Here I’m discounting the value we came up with before to account for the uncertainty with regards to calibration. We look up how well calibrated the model is for each time period we predict that a specific upgrade will produce and create a lower and upper bound around the estimated return on investment.

That gives us something like this:

CreditCard Lower CreditCard Upper 1yrContract Lower 1yrContract Upper
96.252669 142.547331 292.686671 364.013329
186.674292 382.825708 -499.811962 -639.188038
124.151290 298.848710 -353.556490 -492.443510
-290.361140 -422.438860 860.168050 1100.031950
117.688750 179.811250 262.722163 332.277837

Summary

With that we’ve ended our whirlwind tour of Survival Analysis, Concordance and Calibration. What we’ve got for our efforts is an actionable set of data that we can use to keep customers signed up for longer - which is the point of churn prediction!

Thank you for reading!

If you have any questions, please email me.