Churn Prediction Project | Alejandro Pinto

// 01

Process to Follow

A step-by-step pipeline to build a logistic regression model for churn prediction:

Load data and packages for model building & preprocessing.
Preprocessing & missing value imputation.
Select features on the basis of EDA conclusions & build baseline model.
Decide evaluation metric based on the business problem.
Build model using all features and compare with baseline.
Use Reverse Feature Elimination (RFE) to find the top features and build model using the top 10 features.

View Notebook on GitHub

// 02

Data Dictionary

The dataset variables are divided into three categories:

Demographic Information

customer_id

Customer identifier

vintage

Days as customer with the bank

age

Age of customer

gender

Gender of customer

dependents

Number of dependents

occupation

Occupation of the customer

city

City of customer (anonymised)

Customer Bank Relationship

customer_nw_category

Net worth: 3=Low, 2=Medium, 1=High

branch_code

Branch code for customer account

days_since_last_transaction

Days since last credit in last 1 year

Transactional Information

current_balance

Balance as of today

previous_month_end_balance

End of month balance, previous month

average_monthly_balance_prevQ

AMB in previous quarter

average_monthly_balance_prevQ2

AMB in previous-to-previous quarter

current_month_credit

Total credit amount current month

previous_month_credit

Total credit amount previous month

current_month_debit

Total debit amount current month

previous_month_debit

Total debit amount previous month

current_month_balance

Average balance of current month

previous_month_balance

Average balance of previous month

churn ★ target

Average balance falls below minimum next quarter (1/0)

Churn Prediction using Logistic Regression

Process to Follow

Data Dictionary