STATSML 604: Car Loan Propensity Prediction using Logistic Regression
Overview
Predicting whether a customer is likely to take a car loan can significantly improve a bank’s ability to design targeted campaigns, manage credit risk, and optimize resource allocation.
Stage 1: Awareness
The customer realizes a need — perhaps their current car is unreliable, or their lifestyle has changed (family, job move, etc.).
Signals:
Search behavior (Google/search logs)
Browsing car loan info on bank websites
Interactions with car dealerships or vehicle-related services
Stage 2: Research
They begin comparing options, calculating EMIs, and checking eligibility.
Signals:
Clicks on car loan calculators
App logins increase
Engaging with bank agents or chatbot loan FAQs
Increasing balance inquiries
Stage 3: Financial Readiness
Evaluating “Can I afford it?”
Signals:
Growth in monthly net income
Stable or rising credit score
High cash reserves (in savings + checking)
Existing auto loans closed or paid off
Stage 4: Application Intent
They are ready to apply.
Signals:
Clicking on “Apply Now” or loan inquiry forms
Uploading documents to the portal
In this tutorial, we build a simple logistic regression model to classify a customer's car loan propensity focusing on profile and bank transaction behavior.
Use Cases
Targeted Campaigns: Focus offers on high-propensity segments
Loan Eligibility Filtering: Pre-qualify candidates automatically
Customer Risk Profiling: Understand financial behavior deep.
Prerequisites
Goals
Build and deploy a classification model that predicts if a customer is likely to opt for a car loan, based on:
Demographic and behavioral data
Financial account balances
Derived income and loan features
Sample Dataset
Fields & Description
customer_id
STRING
Unique identifier
age
INT
Customer age
gender
STRING
Male / Female / Other
marital_status
STRING
Married / Single / Divorced
employment_status
STRING
Employed / Self-employed / Retired etc.
annual_income
DECIMAL
Yearly income
credit_score
INT
Credit bureau score
checking_balance
DECIMAL
Balance in checking account
savings_balance
DECIMAL
Balance in savings account
monthly_debit_volume
DECIMAL
Monthly average spending
monthly_credit_volume
DECIMAL
Monthly average income
loan_history
ARRAY<STRUCT>
Past loans (type, amount, status)
existing_auto_loan
BOOLEAN
Existing car loan
owns_vehicle
BOOLEAN
Whether customer owns a car
propensity_car_loan
FLOAT
Target: Likelihood (0-1) of taking a loan
If you were to go and explore the sample data, it should look like this:
Derived Features
These features are engineered to improve model performance:
savings_to_income_ratio
savings_balance / annual_income
debt_to_income_ratio
(monthly_debit_volume * 12) / annual_income
avg_monthly_net_income
monthly_credit_volume - monthly_debit_volume
loan_count
COUNT(loan_history)
previous_auto_loans
COUNT WHERE loan_type = 'Auto'
good_credit_flag
credit_score >= 700
high_cash_reserve_flag
checking_balance + savings_balance > 10000
This should look like the following:
Auto-Labeling of Training Data
Not having an auto loan
0.30
More likely to consider buying
Doesn’t own a vehicle
0.20
May need a car, hence loan
Good credit score
0.20
More eligible for credit
Income > $60K
0.10
Likely to get approved
Checking balance > $2K
0.10
Has funds for down payment
Net income > $2K
0.10
Better repayment capacity
Model Definition
Model Evaluation
Results
The query should return the folllowing:
AUC ROC
0.9362
Accuracy
0.9361
Precision
0.9367
Recall
0.9372
Predict on New Customers
The query looks like the foilowing:
Last updated