Skip to content

datacyclist/churn42

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 

Repository files navigation

churn42

Predicting churn in telco industry

I have been tasked to have a deeper look into that problem and in particular to predict churn for the next quarter (Q3 2021) from the data in the first two quarters. In other words, the task consists of identifying which customers are most likely going to churn in the next quarter. By doing so, the company can proactively do its best to avoid customers from churning - by for instance proposing customers with a better deal.

Attention: this is no productive code. This shows how to approach the problem and how it might be solved building a predictive model -- but those are just the first steps towards a bullet-proof model on this dataset.

Howto: this is mostly a graphical drilldown and a model in the end. The code is in the /script directory.

descriptive stats

target variable

Churn Rate

  • The churn rate in the data set is around 25%.

predictors

What do the other variables say?

demographics

Dependents

  • A 5-to-2 ratio of customers without and with dependents.

Partner

  • Around half of the customers have partners.

SeniorCitizen

  • The share of seniors among the customers seems to be roughly similar to the general population.

gender

  • Customers' gender is quite balanced.

contract features

Contract

  • A lot of the clients have month-to-month (prepaid?) contracts.

Device Protection

  • Not so sure why there's a "no internet service" here. Might be joined with the "no" value, but I'll leave it for now.

InternetService

  • Almost half of the customers have Fiber optic, and 1500/7000 have no internet service. Is that an ancient dataset?

PhoneService

  • 90% of the customers have phone service.

OnlineBackup

  • Most of the customers do not have online backup or even no internet service.

OnlineSecurity

  • Most of the customers do not have online security or even no internet service.

StreamingTV

StreamingMovies

  • Streaming TV and Movies seem to be in the same package, which makes sense.

TechSupport

  • Half of the customers haven't needed tech support and a lot don't have internet service and therefore don't need tech support (?).

payment stuff

PaperlessBilling

  • The majority of customers has paperless billing.

PaymentMethod

  • Around one third of the customers pay by electronic check, the rest is evenly distributed over the other options.

MonthlyCharges

  • 1400 of 7000 customers pay up to 25 (currency units) a month, the rest is quite spread out up to around 120.

TotalCharges

  • Long-tail distribution. A large share of customers (presumably also the ones with short tenure) have low total charges, the rest stay for longer and pay more over time

duration of contract

tenure

  • A fun distribution, ranging from zero to 72 months.

Correlations with churn

heatmap

Get a quick heatmap: how are the variables related to each other?

correlation heatmap

  • The line with churn=YES shows at least some non-zero patterns. Phew.
  • The "no internet service" values in the different predictors are fully correlated. Makes sense.
  • Monthly/total charges seem to be somehow correlated to the additional services like online security, backup etc. Makes sense, too.

correlations

strongest correlations with churn

This gives an idea on the single predictors which are correlated with churn. Seems to make sense.

  • two-year contracts -> less churn
  • month-to-month contracts -> more churn
  • the longer the tenure, the less churn (this is kind of obvious and circular, but good to see)
  • having additional contract features could lead to less churn (what the graphic says, just expressed the other way around)
  • having a fiber optic service is highly correlated with churn

descriptive with relation to churn

If we split up the dataset into churn=yes and churn=no and compare the distributions of each of the variables between those subgroups, this will give an idea of the direction that some of the features could be changed to reduce churn. Predictors that are not listed show no difference in the distributions.

demographics

Dependents

  • People without dependents churn more often. Can we do something here? Give discounts to families?

Partner

  • Customers who have no partners churn more often.

SeniorCitizen

  • Seniors churn more frequently.

contract features

Contract

  • The churn quota is highest among the month-to-month contracts, which should be expected.
  • Try to upgrade those clients to fixed-duration contracts.

DeviceProtection

  • Customers without DeviceProtection churn more often. And people with internet service stay?

InternetService

  • Customers with Fiber Optic churn way more than those on DSL. Is there any regulatory background here? Monopoly on DSL and competition on fiber?

OnlineBackup

  • Customers without OnlineBackup churn more often. This is probably related to some internet combination offer.

OnlineSecurity

  • Customers without OnlineSecurity churn more often.
  • This is probably related to some internet combination offer that combines the online backup and online security.

StreamingMovies

StreamingTV

  • The main factor here seems to be the no internet service rather than those additional services.

TechSupport

  • Customers without Tech Support churn more often. But I don't know if they never needed tech support because the service is so good.

payment stuff

PaperlessBilling

  • Paperless billing is possibly related to higher churn.
  • But this probably doesn't mean that we should get the customers back towards paper bills.

PaymentMethod

  • Electronic Check is by far the worst payment method when it comes to churn. Try to get customers away from this payment method.

MonthlyCharges

  • Customers with expensive contracts churn more often than those with cheap contracts.

TotalCharges

  • Customers with small Total Charges churn more often.
  • But it could also be the other way around: because they churn they don't run up that many total charges.
  • And on the other end of the spectrum: the more total charges, the less likely to churn.

duration of contract

tenure

  • Short tenure = high churn. Causality can go both ways here, also on the right end of the graphic.

predictive model

model building

  • The data is not too big and complex, I'll just go with a glm here.
  • The principle would be pretty much the same for other models.
    • Take all the predictors above (or maybe a subset, I could refine this further)
    • Train the model to predict churn on a training set.
    • Calculate model accuracy on the test set.
  • The baseline accuracy that I would have to beat is 73% (predicting churn=no all the time).

model summary

  • The pretty simple model yields around 80% accuracy.
  • The most important predictors are listed. These and the trained model can be used to identify the customers who are likely to churn and to identify measures that could be taken.

recommendations

Some ideas purely from the data, which probably don't always make sense:

  • Try to get customers away from PaymentMethod=electronic check
  • Upsell them to OnlineSecurity=yes and OnlineBackup=yes
  • Get customers away from fiber optic (good from a retention point of view but certainly not from a technology point of view).
  • Get them away from month-to-month contracts (this is quite logical).
  • Get them to pay higher total charges (maybe by upselling them).

further thoughts

  • In a real context, the subset of customers for which I would predict churn should probably be only those who haven't churned yet.

About

Predicting churn in telco industry

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages