These instructions assume you already have Jupyter Notebooks installed. If you don't then you can consider the resources at https://jupyter.org/install.
To install the required packages, run the following...
import sys
!{sys.executable} -m pip install fightchurn
If you are wondering why that command, I followed these instructions.
You should see a bunch of outputs about collecting and installing packages. Yours might not look exactly like this, but something along these lines...
After several minutes (depending on your sytem and internet speed) you should see this at the end of the install...
(You might not see the warning about upgrading pip but if you do don't worry about it.)
Next, make these imports to get ready to run the code...
from fightchurn import run_churn_listing
from fightchurn.run_churn_listing import run_listing
Now you need to set a few enviroment variables. These are:
- The database name : 'churn' in the example below
- The username for the database
- The password for the database
- A local folder where outputs can be written....
run_churn_listing.set_churn_environment('churn','user','password','/path/to/my_churn_output_folder')
This will print out a confirmation line as follows:
Setting Environment Variables user=carl for db=churn
Next, you need to write some data to the database in order to run the code against - no data is provided with the code distribution.
New in 2023: There is now a full report describing the inner workings of the churn simulation:
- ChurnSim: A Customer Churn Behavioral Simulation System For Education and Analysis
- It is not necessary to read the report to learn the churn fighting techniques from the book - this is intended for advanced data scientists who want to create their own simulations.
Use the function run_churn_listing.run_standard_simulation
to run the basic simulation
described in the book.
- You can speed up the simulation by adding the parameter
n_parallel=<X>
where<X>
is an appropriate number of parallel workers for your machine.
run_churn_listing.run_standard_simulation(n_parallel=5)
You will see output as follows...
This will continue for a while - maybe 15-30 minutes if you are running with a single core.
Looking for an extra challenge? Try running and analyzing the advanced simulation described in
the ChurnSim White Paper! To try the new simulation follow the setup
instructions and add the parameter 'crm5'
to the run_standard_simulation
function call.
- You can speed up the simulation by adding the parameter
n_parallel=<X>
where<X>
is an appropriate number of parallel workers for your machine.
run_churn_listing.run_standard_simulation('crm5', n_parallel=5)
The CRM simulation will take an hour or so on a typical computer and produce around 30GB of data in your PostgreSQL database. The runtime depends on the degree of parallelism - for a single core it can take 4+ hours.
See section 4.5 below for information on how to run the book listings against the simulated CRM dataset.
Now you are ready to run some code from the book! To do that you use the run_listing
function that you previously imported. For examle, the following is chapter 2, listing 2:
run_listing(2,2)
You should see output like this:
Explaining what you ares seeing there is beyond the scope of this README, thats what the book is about! But if you have gotten this far, then you have completed all the setup and you are ready to follow along with the book (or videos, however you are learning the code...)
In some parts of the book you might want to run more than one listing at once. To do this, pass as a list for the listing argument. For example, to run all four chapter 2 churn calculation listings try:
run_churn_listing.run_listing(2,[1,2,3,4])
Later in the book, some of the listings have multiple versions with different arguments. The
run_listing
function also takes a version argument. For example, to run a query and plot the
results of the events per day for the first event created by the simulation, try the following:
run_churn_listing.run_listing(chapter=3,listing=[9,10],version=1)
That command should save a plot like this to your output directory:
You can also run multiple versions at once:
run_churn_listing.run_listing(chapter=3,listing=[9,10],version=[2,3])
For more information about what the code listings do, see the book Fighting Churn With Data.
To run a listing from a model other than the social network simulation, add the schema argument
to run_listing
. For example, to run chapter 2, listing 2 on the CRM simulation described in
ChurnSim White Paper run:
run_churn_listing.run_listing(2,2,schema='crm5')
Note: At the time of this update, not all listings for the CRM simulation are configured in the package. If you are interested in the advanced simulation you should run the IDE install, and review and edit the listing config described there. Or check back here for updates on running the CRM simulation from the package in future versions.