// 01

Process to Follow

A step-by-step pipeline to build a logistic regression model for churn prediction:

  • Load data and packages for model building & preprocessing.
  • Preprocessing & missing value imputation.
  • Select features on the basis of EDA conclusions & build baseline model.
  • Decide evaluation metric based on the business problem.
  • Build model using all features and compare with baseline.
  • Use Reverse Feature Elimination (RFE) to find the top features and build model using the top 10 features.
View Notebook on GitHub
// 02

Data Dictionary

The dataset variables are divided into three categories:

Demographic Information
customer_id
Customer identifier
vintage
Days as customer with the bank
age
Age of customer
gender
Gender of customer
dependents
Number of dependents
occupation
Occupation of the customer
city
City of customer (anonymised)
Customer Bank Relationship
customer_nw_category
Net worth: 3=Low, 2=Medium, 1=High
branch_code
Branch code for customer account
days_since_last_transaction
Days since last credit in last 1 year
Transactional Information
current_balance
Balance as of today
previous_month_end_balance
End of month balance, previous month
average_monthly_balance_prevQ
AMB in previous quarter
average_monthly_balance_prevQ2
AMB in previous-to-previous quarter
current_month_credit
Total credit amount current month
previous_month_credit
Total credit amount previous month
current_month_debit
Total debit amount current month
previous_month_debit
Total debit amount previous month
current_month_balance
Average balance of current month
previous_month_balance
Average balance of previous month
churn ★ target
Average balance falls below minimum next quarter (1/0)