Page cover

STATSML 604: Car Loan Propensity Prediction using Logistic Regression

Overview

Predicting whether a customer is likely to take a car loan can significantly improve a bank’s ability to design targeted campaigns, manage credit risk, and optimize resource allocation.

Stage 1: Awareness

  • The customer realizes a need — perhaps their current car is unreliable, or their lifestyle has changed (family, job move, etc.).

  • Signals:

    • Search behavior (Google/search logs)

    • Browsing car loan info on bank websites

    • Interactions with car dealerships or vehicle-related services

Stage 2: Research

  • They begin comparing options, calculating EMIs, and checking eligibility.

  • Signals:

    • Clicks on car loan calculators

    • App logins increase

    • Engaging with bank agents or chatbot loan FAQs

    • Increasing balance inquiries

Stage 3: Financial Readiness

  • Evaluating “Can I afford it?”

  • Signals:

    • Growth in monthly net income

    • Stable or rising credit score

    • High cash reserves (in savings + checking)

    • Existing auto loans closed or paid off

Stage 4: Application Intent

  • They are ready to apply.

  • Signals:

    • Clicking on “Apply Now” or loan inquiry forms

    • Uploading documents to the portal

In this tutorial, we build a simple logistic regression model to classify a customer's car loan propensity focusing on profile and bank transaction behavior.

Use Cases

  • Targeted Campaigns: Focus offers on high-propensity segments

  • Loan Eligibility Filtering: Pre-qualify candidates automatically

  • Customer Risk Profiling: Understand financial behavior deep.

Prerequisites

PREP 400: DBVisualizer SQL Editor Setup for Data DistillerSTATSML 600: Data Distiller Advanced Statistics & Machine Learning Models

Goals

Build and deploy a classification model that predicts if a customer is likely to opt for a car loan, based on:

  • Demographic and behavioral data

  • Financial account balances

  • Derived income and loan features

Sample Dataset

Fields & Description

Field Name
Data Type
Description

customer_id

STRING

Unique identifier

age

INT

Customer age

gender

STRING

Male / Female / Other

marital_status

STRING

Married / Single / Divorced

employment_status

STRING

Employed / Self-employed / Retired etc.

annual_income

DECIMAL

Yearly income

credit_score

INT

Credit bureau score

checking_balance

DECIMAL

Balance in checking account

savings_balance

DECIMAL

Balance in savings account

monthly_debit_volume

DECIMAL

Monthly average spending

monthly_credit_volume

DECIMAL

Monthly average income

loan_history

ARRAY<STRUCT>

Past loans (type, amount, status)

existing_auto_loan

BOOLEAN

Existing car loan

owns_vehicle

BOOLEAN

Whether customer owns a car

propensity_car_loan

FLOAT

Target: Likelihood (0-1) of taking a loan

If you were to go and explore the sample data, it should look like this:

A simple data exploration query with DB Visualizer

Derived Features

These features are engineered to improve model performance:

Feature
Formula / Logic

savings_to_income_ratio

savings_balance / annual_income

debt_to_income_ratio

(monthly_debit_volume * 12) / annual_income

avg_monthly_net_income

monthly_credit_volume - monthly_debit_volume

loan_count

COUNT(loan_history)

previous_auto_loans

COUNT WHERE loan_type = 'Auto'

good_credit_flag

credit_score >= 700

high_cash_reserve_flag

checking_balance + savings_balance > 10000

This should look like the following:

Derived features along with the base features

Auto-Labeling of Training Data

Feature
Weight
Explanation

Not having an auto loan

0.30

More likely to consider buying

Doesn’t own a vehicle

0.20

May need a car, hence loan

Good credit score

0.20

More eligible for credit

Income > $60K

0.10

Likely to get approved

Checking balance > $2K

0.10

Has funds for down payment

Net income > $2K

0.10

Better repayment capacity

Auto-labeling of training data

Model Definition

Model Evaluation

Results

The query should return the folllowing:

Query results on the evaluation
Metric
Value

AUC ROC

0.9362

Accuracy

0.9361

Precision

0.9367

Recall

0.9372

Predict on New Customers

The query looks like the foilowing:

Predictions on the same dataset

Last updated