ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller

Unlock the power of NPS to measure and improve customer loyalty and satisfaction

Prerequisites

Here’s the structure of the dataset. It has 1000 responses to a NPS survey which has been hydrateed with enriched with RFM (Recency, Frequency, Monetary) and RFE (Recency, Frequency, Engagment) style attributes.

customer_id:Unique identifier for the customer.
nps_score:The raw NPS score (0-10 scale).
promoter_flag:A binary flag indicating if the customer is a promoter (1 for NPS scores of 9-10).
passive_flag:A binary flag indicating if the customer is passive (1 for NPS scores of 7-8).
detractor_flag:A binary flag indicating if the customer is a detractor (1 for NPS scores of 0-6).
purchase_frequency:The number of purchases the customer has made in the last 12 months.
avg_order_value:The average amount spent by the customer per order.
total_spent:The total amount spent by the customer.
customer_support_interactions:The number of times the customer interacted with support.
marketing_emails_clicked:Number of marketing emails clicked by the customer.
account_age_in_days:The number of days since the customer created their account.
churn_flag:A binary flag for whether the customer churned or not (0 for not churned, 1 for churned).

Tip: No matter the structure of your data, as long as you transform it into the flat, canonical schema via Data Distiller, you can apply all of the queries provided below. Alternatively, you can template the queries to suit your specific needs.

Overview

Net Promoter Score (NPS) is a metric used by organizations to measure customer loyalty and satisfaction. It's derived from a single survey question:

"On a scale of 0 to 10, how likely are you to recommend our product or service to a friend or colleague?"

Based on their response, customers are categorized into three groups:

Promoters (9-10): Enthusiastic, loyal customers who are likely to recommend your product or service.
Passives (7-8): Satisfied but unenthusiastic customers who are vulnerable to competitive offerings.
Detractors (0-6): Unhappy customers who could damage your brand through negative word-of-mouth.

Traditional NPS

\text{NPS} = \% \text{Promoters} - \% \text{Detractors}

The percentage of Promoters (%Promoters) refers to the proportion of customers classified as Promoters out of the total number of respondents, which includes Promoters, Passives, and Detractors. The same applies to the %Detractors. This results in a score ranging from -100 to +100, where:

Positive NPS indicates that more customers are promoters than detractors.
Negative NPS signals that more customers are detractors, a warning sign of poor customer satisfaction.

In traditional NPS calculations, Passives are excluded from the final score, with only Promoters and Detractors contributing to the outcome. Passives have no direct influence on the NPS result except in the inclusion in the total for computing the Promotor and Detractor percentages. A simpler way to view the formula is that Promoters are assigned +1 point, Passives receive 0 points, and Detractors are assigned -1 point

NPS Use Cases in Adobe Experience Platform

Segment Customers Based on NPS Categories

By categorizing customers as Promoters, Passives, or Detractors, businesses can create enriched customer segments in Adobe Experience Platform (AEP). Each NPS group reflects different customer sentiments and behaviors, which can then drive personalized marketing or support strategies:

Promoters: Can be targeted with loyalty programs, exclusive offers, or referral incentives to amplify their positive impact.
Passives: Can be nudged toward becoming Promoters with tailored offers or incentives to increase their engagement and satisfaction.
Detractors: Require attention with special customer service offers, surveys for deeper feedback, or even product improvements to mitigate negative sentiment.

Predictive Models for Churn and Retention

NPS can be used as a key indicator in churn prediction models. Customers categorized as Detractors may be more likely to churn, while Promoters are often more loyal.

Detractors can trigger workflows for retention efforts, such as sending out discounts or personalized support.
Promoters might trigger marketing campaigns focused on advocacy, encouraging them to leave reviews or promote the brand on social media.

Personalized Engagements and Cross-Channel Journeys

You can tailor personalized marketing engagements based on a customer's NPS score across multiple touchpoints.

Promoters: Can receive real-time in-app rewards, loyalty program invitations, or be nudged toward higher-tier memberships.
Detractors: Might receive customer service interactions or problem-resolution emails right after a low NPS score is recorded.

Using Adobe Journey Optimizer, NPS data can also trigger different customer journeys, ensuring that each customer gets the right message or experience based on their satisfaction levels.

Real-Time Feedback Loops with Data Distiller Derived Attributes

The Real-Time Customer Profile can be updated with each interaction or survey response. By integrating NPS surveys into Data Distiller Derived Attributes, you can ensure that customer sentiment data is always fresh and up-to-date. This allows:

Immediate action: When a detractor gives a poor NPS score, this can trigger a workflow for the customer service team to reach out.
Continual monitoring: As customer satisfaction improves, so does their NPS, and these updates can be fed back into customer profiles for more refined future engagements.

Data Distiller Audience Enrichment with Behavioral Data

In AEP, NPS data can be combined with other behavioral, transactional, or demographic data to build a fuller customer profile. For example, a Detractor who also has high interaction rates with support may indicate deeper customer service issues. On the other hand, a Promoter who purchases frequently could be offered a loyalty tier upgrade to deepen brand engagement.

Sample Size Considerations for NPS Surveys

In traditional NPS calculations, although we collect responses from three categories—Promoters, Passives, and Detractors—the NPS score itself simplifies the calculation to a binomial structure. This is because the NPS formula only considers Promoters and Detractors, while Passives are excluded from the final calculation (they have a weight of zero). Essentially, the multinomial distribution (with three categories) is approximated as a binomial distribution by treating the survey responses as either Promoters (success) or Detractors (failure), while ignoring Passives. However, note that Passives are still included in the overall sample size, which impacts the precision of the calculation and the confidence intervals.

The binomial distribution describes the probability of achieving a certain number of successes (e.g., Promoters in your survey) in a fixed number of independent trials (e.g., survey responses), where each trial has only two possible outcomes (e.g., Promoter or not Promoter). In this context:

Success corresponds to a customer being a Promoter (NPS score of 9 or 10),
Failure corresponds to a customer being a Detractor (NPS score of 0 to 6).

The traditional NPS calculation, therefore, simplifies the multinomial survey into a binomial process, focusing on the difference between the proportions of Promoters and Detractors.

To ensure that your NPS surveys are reliable and represent your customer base, you need a statistically significant sample size. The key factors affecting this include:

Confidence Level: Typically set at 95%.
Margin of Error: Often chosen as ±5%.
Customer Base Size: The larger your base, the more responses you need to ensure accuracy. For large bases, around 400-500 responses are generally sufficient.
Segment Diversity: If your customer base includes diverse segments (e.g., regional or demographic groups), it may be necessary to oversample to ensure all groups are represented.

In a large sample size situation, the binomial distribution which describes the probability of a given number of successes in a fixed number of independent trials, can be approximated by a normal distribution, thanks to the Central Limit Theorem. The confidence interval E for a proportion p is given by:

p \pm Z \times \sqrt{\frac{p \times (1 - p)}{n}}

Where:

p is the sample proportion,
Z is the Z-score associated with the desired confidence level,
n is the sample size.

This formula ensures that the sample size is large enough to estimate the population proportion with a specified margin of error and confidence level.

Rearranging this to solve for the required sample size n gives the formula:

n = \frac{Z^2 \times p \times (1 - p)}{E^2}

n is the required sample size
Z is the Z-score from 1.96 for 95%
p is the estimated proportion of promoters typically 0.5 if unknown
E is the margin of error of 0.05 for ±5%

Using this formula, we can calculate that approximately 384 responses would be required for a 95% confidence level and a ±5% margin of error.

Tip: The calculation of 384 responses applies to any survey where you're trying to estimate a proportion (such as customer satisfaction, NPS, or any binary outcome like "yes/no" responses).

Practical Considerations for a Smaller Customer Base

For a smaller customer base, you can use the finite population correction (FPC) to adjust the sample size:

n_{\text{adjusted}} = \frac{n}{1 + \frac{n - 1}{N}}

Where the sample size has been adjusted from the n we computed above and N is the number of customers in your database). For a population of 1,000 customers, the adjusted sample size using the finite population correction is approximately 278 responses. This would still provide a 95% confidence level with a ±5% margin of error, but requires fewer responses than the unadjusted sample size due to the smaller population

Large-scale surveys can be expensive, so consider how many responses are feasible while still achieving statistically reliable results.

Traditional NPS Calculation in Data Distiller

The NPS formula is:

\text{NPS} = \% \text{Promoters} - \% \text{Detractors}

Let us now compute the NPS for the sample:

WITH nps_categories AS (
    SELECT
        CASE
            WHEN nps_score >= 9 THEN 'Promoter'
            WHEN nps_score BETWEEN 7 AND 8 THEN 'Passive'
            ELSE 'Detractor'
        END AS nps_category
    FROM nps_survey_dataset
)
SELECT
    (COUNT(CASE WHEN nps_category = 'Promoter' THEN 1 END) * 100.0 / COUNT(*)) -
    (COUNT(CASE WHEN nps_category = 'Detractor' THEN 1 END) * 100.0 / COUNT(*)) AS nps_score
FROM nps_categories;

The result will be:

Generalize to the Population with Binomial Distribution

Let us now generalize this to the entire population. When we generalize the NPS from a sample to the entire population, you are estimating the NPS for the population based on the sample. However, because you are using only a sample of the population, you need to account for uncertainty. This is where the confidence interval comes into play.

Calculate the Proportions:
- Pp: Proportion of Promoters in the sample is #Promoters/n
- Pd: Proportion of Detractors in the sample #Detractors/n
Calculate the Standard Error (SE): The standard error (SE) is a measure of how much uncertainty there is in your estimate of a value—in this case, the difference between the proportion of Promoters and Detractors in your NPS calculation. It helps quantify how much your sample results might vary if you were to take different samples from the same population.The formula for SE takes into account both
- How much variability there is in the Promoter percentage as shown by the first term below Pp(1-Pp)
- How much variability there is in the Detractor percentage as shown by the second term below
Pd(1-Pd)
and divides each by the sample size n to reflect that larger samples tend to produce more stable (less variable) estimates. Then, it adds them together and takes the square root.

SE = \sqrt{\frac{P_p (1 - P_p)}{n} + \frac{P_d (1 - P_d)}{n}}

Choose Confidence Level: For a 95% confidence level, the Z-score is 1.96. For other confidence levels, use the corresponding Z-score (e.g., 1.64 for 90% confidence).
Calculate the Confidence Interval: The confidence interval for NPS is:

\text{NPS} \pm Z \times SE

Where:

NPS is your sample NPS score.
Z is the Z-score for your chosen confidence level.
SE is the standard error.

Copy and execute the following piece of SQL code:

WITH nps_calculation AS (
    SELECT
        COUNT(*) AS total_responses,
        SUM(CASE WHEN nps_score >= 9 THEN 1 ELSE 0 END) AS promoters,
        SUM(CASE WHEN nps_score BETWEEN 0 AND 6 THEN 1 ELSE 0 END) AS detractors
    FROM nps_survey_dataset
),
proportions AS (
    SELECT
        total_responses,
        promoters,
        detractors,
        CAST(promoters AS FLOAT) / total_responses AS proportion_promoters,
        CAST(detractors AS FLOAT) / total_responses AS proportion_detractors
    FROM nps_calculation
),
standard_error_calculation AS (
    SELECT
        total_responses,
        proportion_promoters,
        proportion_detractors,
        (SQRT(
            (proportion_promoters * (1 - proportion_promoters) / total_responses) +
            (proportion_detractors * (1 - proportion_detractors) / total_responses)
        )) AS standard_error
    FROM proportions
)
SELECT
    (proportion_promoters * 100 - proportion_detractors * 100) AS nps,
    standard_error,
    -- Z-score for 95% confidence level is 1.96
    (proportion_promoters * 100 - proportion_detractors * 100) - 1.96 * (standard_error * 100) AS lower_bound_ci,
    (proportion_promoters * 100 - proportion_detractors * 100) + 1.96 * (standard_error * 100) AS upper_bound_ci
FROM standard_error_calculation;

The result will be:

This shows that the NPS score for the population will range from -49 to -41 with 95% confidence. The 95% confidence interval means that if we were to repeat this survey multiple times, in 95 out of 100 cases, the true NPS score for the entire population would fall somewhere within this range. In other words, we're pretty certain that the population’s NPS score is somewhere between -49 and -41, but it could vary slightly if we surveyed everyone.

However, this does not mean that there won’t be outliers or individual scores that fall outside this range. The confidence interval reflects the overall population’s average NPS score, not individual customer responses. It’s possible to have a few extreme responses (either very positive or very negative) that are not captured by this interval, but these outliers won’t significantly shift the average NPS score for the entire population.

The termPp(1-Pp) is used to represent the variance in a binomial distribution, which is the distribution for yes/no outcomes (e.g., Promoter or not Promoter). This term quantifies how much variability there is in the proportion of Promoters in your sample.

High variability: If there’s a lot of variation between Promoters and non-Promoters in your sample (e.g., the proportions are more balanced, like 50% Promoters and 50% non-Promoters), the value ofPp(1-Pp) will be larger.
Low variability: If almost everyone in your sample is either a Promoter or not a Promoter (e.g., 90% Promoters and only 10% non-Promoters), there’s less variability, and the value ofPp(1-Pp)will be smaller.

Generalize to the Population with Normal Distribution Approximation

The normal distribution approximation to the binomial distribution works well when:

The sample size is large enough - a common rule of thumb is that a sample size of 30 or more is often considered enough for the normal distribution, but this is under ideal conditions (symmetrically distributed data). For NPS, where the data can be skewed, you often need larger samples.
The probability of success is not too close to 0 or 1.

Specifically, the approximation is typically considered valid when both:

n × Pp ≥ 5. and. n × (1−Pp) ≥ 5

where:

n is the number of trials (or sample size).
Pp is the probability of success (e.g., the proportion of Promoters in NPS).
1-Pp is the probability of failure (e.g., the proportion of non-Promoters).

WITH promoter_calculation AS (
    SELECT
        COUNT(*) AS total_responses,
        SUM(CASE WHEN nps_score >= 9 THEN 1 ELSE 0 END) AS promoter_count
    FROM nps_survey_dataset
),
proportion_calculation AS (
    SELECT
        total_responses,
        promoter_count,
        CAST(promoter_count AS FLOAT) / total_responses AS p,
        CAST(total_responses - promoter_count AS FLOAT) / total_responses AS non_promoter_p
    FROM promoter_calculation
)
SELECT
    total_responses,
    p,
    non_promoter_p,
    total_responses * p AS n_times_p,
    total_responses * (1 - p) AS n_times_1_minus_p,
    CASE
        WHEN total_responses * p >= 5 AND total_responses * (1 - p) >= 5 THEN 'Conditions Met'
        ELSE 'Conditions Not Met'
    END AS condition_check
FROM proportion_calculation;

The results show that the condition is met:

If the NPS distribution is normally distributed, the variance or standard deviation of the NPS scores treated as a continuous variable can be used to calculate the confidence interval. Instead of calculating the proportions of Promoters and Detractors, you would use the sample mean and sample variance of the NPS scores directly.

Calculate the Mean (NPS): The mean NPS score from your sample:

\bar{X} = \frac{\sum X}{n}

Where X represents the individual NPS scores and n is the total number of responses.

Calculate the Variance:

\sigma^2 = \frac{1}{n} \sum (X - \bar{X})^2

Calculate the Standard Error (SE) using the variance:

SE = \frac{\sigma}{\sqrt{n}}

Calculate the Confidence Interval (CI):

\text{CI} = \bar{X} \pm Z \times SE

Where:

Z is the Z-score for the desired confidence level (e.g., 1.96 for 95% confidence).

We will now write a SQL query for the above situation where the NPS scores are weighted as follows:

Promoters (9–10): 1 point,
Passives (7–8): 0 points,
Detractors (0–6): -1 point,

we will apply the normal approximation to calculate the NPS and the confidence interval for the given population.

WITH nps_transformation AS (
    SELECT
        -- Assign weights based on NPS score
        CASE
            WHEN nps_score >= 9 THEN 1      -- Promoters (9-10)
            WHEN nps_score BETWEEN 7 AND 8 THEN 0  -- Passives (7-8)
            WHEN nps_score <= 6 THEN -1     -- Detractors (0-6)
        END AS transformed_nps_score
    FROM nps_survey_dataset
),
nps_statistics AS (
    SELECT
        -- Calculate the mean of the transformed NPS scores
        AVG(transformed_nps_score) AS mean_transformed_nps,
        -- Calculate the variance of the transformed NPS scores
        VARIANCE(transformed_nps_score) AS variance_transformed_nps,
        -- Count total responses
        COUNT(*) AS total_responses
    FROM nps_transformation
),
standard_error_calculation AS (
    SELECT
        mean_transformed_nps,
        -- Standard error = sqrt(variance / n)
        SQRT(variance_transformed_nps / total_responses) AS standard_error
    FROM nps_statistics
)
SELECT
    -- Transformed NPS mean, scaled to match traditional binomial NPS (-100 to 100)
    mean_transformed_nps * 100 AS transformed_nps,
    standard_error,
    -- Calculate the 95% confidence interval (Z = 1.96)
    (mean_transformed_nps * 100) - 1.96 * (standard_error * 100) AS lower_bound_ci,
    (mean_transformed_nps * 100) + 1.96 * (standard_error * 100) AS upper_bound_ci
FROM standard_error_calculation;

You can see that the results are pretty close to our calculations using binomial distribution:

What are NPS Ranges Across Industries?

Here is a snapshot of what the NPS scores mean:

NPS Score Range

Meaning

Customer Sentiment

-100 to -1

More Detractors than Promoters. Room for serious concern

Unhappy, likely to leave or speak negatively

Equal number of Promoters and Detractors

Neutral, could go either way

1 to 49

More Promoters than Detractors. Good, but needs improvement

Generally satisfied, but potential vulnerabilities

50 to 74

Excellent. Strong customer loyalty

Very satisfied, likely to recommend

75 to 100

Outstanding. Almost all Promoters, very few Detractors

Extremely satisfied, strong brand advocacy

Here is what the NPS scores typically mean and you can get a lot of such data on the web:

Industry

Average NPS

Top Performers

Retail

30 to 40

70+

Telecommunications

10 to 20

50+

Financial Services/Banking

30 to 45

60+

Healthcare

30 to 40

50+

Software & SaaS

30 to 40

70+

E-commerce

40 to 60

75+

B2B Technology

30 to 50

65+

Hospitality (Hotels)

40 to 60

70+

Automotive

30 to 45

65+

Airlines

35 to 45

60+

Utilities

10 to 20

50+

Logistics/Shipping

10 to 20

50+

Television/Internet Providers

0 to 15

40+

Insurance

20 to 40

60+

Here is summary of NPS scores across industries

NPS > 50 is considered excellent across most industries.
NPS between 30 and 50 is good, indicating satisfied customers with potential areas for improvement.
NPS below 30 signals that there's significant room for improvement, and a negative NPS indicates customer dissatisfaction.

Tip: Compare your NPS to industry averages to get a clearer picture of how you're performing relative to competitors.

Customer Expectations: In industries like telecommunications or utilities, customers generally have lower expectations for service and satisfaction, which results in lower average NPS scores. In contrast, tech and retail sectors often have higher customer expectations, and companies must work harder to earn high NPS scores.

Competition and Product Nature: Some industries, such as e-commerce or SaaS, can easily provide a high-quality, personalized customer experience, leading to higher NPS scores. In contrast, industries like insurance or telecom, which are often seen as commoditized or have more rigid service structures, tend to see lower NPS scores.

Customer Interaction Complexity: Companies in industries that have complex customer interactions, like healthcare or financial services, often have lower NPS scores, since these industries deal with more intricate services that are harder to standardize in terms of customer experience.

Weighted NPS

A weighted NPS is used in situations where an organization wants to emphasize certain customer segments or give different levels of importance to customer feedback. The traditional NPS equally balances Promoters and Detractors, while ignoring Passives, but some business contexts might justify a weighted approach. Some companies do adopt custom variations of NPS for their internal metrics, especially in B2B, enterprise-level organizations, or premium service sectors, where certain customers are significantly more valuable than others. These variations often remain proprietary, tailored to the company’s business model and customer engagement strategy.

Here are some ways in which weighted NPS could be used:

When Certain Groups Are More Critical to the Business:
- Promoters could be given a higher weight if the business wants to strongly emphasize the importance of customer advocacy and referrals. For example, in industries where word-of-mouth marketing is crucial, the impact of Promoters could be magnified.
- Detractors could be downweighted if their negative feedback is less concerning for certain business models (e.g., highly niche markets where negative feedback from outliers is less relevant).
When Passives Play a Significant Role:
- Passives typically do not impact NPS, but in certain industries, satisfied but unenthusiastic customers might still provide value (e.g., they are long-term customers who continue to purchase but don’t actively promote). A weighted NPS could include Passives to account for their steady contribution to revenue.
Customizing NPS for Specific Business Goals:
- Companies might want to assign different weights to customer segments based on profitability, brand loyalty, or customer lifetime value (CLV). For instance, a high-value segment of Promoters could be weighted more heavily to reflect their overall business impact.
- A weighted NPS could be used to focus more on customer satisfaction in high-margin products or premium services where Passives may still contribute significantly to profit.
B2B vs. B2C Contexts:
- In B2B (business-to-business) environments, where relationships with clients tend to be deeper and longer-lasting, a weighted NPS might be useful. For example, Passives (clients who continue using the service without actively recommending it) might be more valuable in a B2B context than in B2C (business-to-consumer), where immediate action from Promoters or Detractors is more critical.
Long-Term Strategy vs. Short-Term Tactics:
- In some cases, companies may want to emphasize long-term relationships with customers over short-term sales. A weighted NPS could assign more points to Passives or Promoters who may not actively advocate but continue to make purchases, supporting a long-term retention strategy.
Customized NPS in Specific Sectors:
- Some industries might use a weighted NPS to tailor the metric to the realities of their customer dynamics:
  - Healthcare: The stakes are high, and dissatisfied customers (Detractors) could have outsized impacts, so Promoters might be weighted higher to emphasize positive patient experiences.
  - Luxury Brands: Here, Promoters are especially valuable, so their feedback might be assigned more weight.

Weighted NPS Formula:

Let’s assume the following weights:

Promoters: Wp
Passives: Wpassive
Detractors: Wd

The adjusted NPS formula would then become:

\text{Adjusted NPS} = W_p \times P_p + W_{\text{passive}} \times P_{\text{passive}} - W_d \times P_d

Where:

Pp is the proportion of Promoters,
Ppassive is the proportion of Passives,
Pd is the proportion of Detractors.

The general formula for the standard error of a weighted sum of proportions is:

SE = \sqrt{\frac{W_p^2 \times P_p (1 - P_p)}{n} + \frac{W_{\text{passive}}^2 \times P_{\text{passive}} (1 - P_{\text{passive}})}{n} + \frac{W_d^2 \times P_d (1 - P_d)}{n}}

Where:

n is the total number of survey responses and the other parameters aree as defined above.

Let us assume the scenario where Promoters get +2 points, Passives get +1 point, and Detractors get -3 points:

WITH nps_transformation AS (
    SELECT
        -- Assign weights based on NPS score
        CASE
            WHEN nps_score >= 9 THEN 2      -- Promoters (9-10 get +2 points)
            WHEN nps_score BETWEEN 7 AND 8 THEN 1  -- Passives (7-8 get +1 point)
            WHEN nps_score <= 6 THEN -3     -- Detractors (0-6 get -3 points)
        END AS transformed_nps_score
    FROM nps_survey_dataset
),
nps_statistics AS (
    SELECT
        -- Calculate the mean of the transformed NPS scores
        AVG(transformed_nps_score) AS mean_transformed_nps,
        -- Calculate the variance of the transformed NPS scores
        VARIANCE(transformed_nps_score) AS variance_transformed_nps,
        -- Count total responses
        COUNT(*) AS total_responses
    FROM nps_transformation
),
standard_error_calculation AS (
    SELECT
        mean_transformed_nps,
        -- Standard error = sqrt(variance / n)
        SQRT(variance_transformed_nps / total_responses) AS standard_error
    FROM nps_statistics
)
SELECT
    -- Transformed NPS mean, scaled to match traditional binomial NPS (-100 to 100)
    mean_transformed_nps * 100 AS transformed_nps,
    standard_error,
    -- Calculate the 95% confidence interval (Z = 1.96)
    (mean_transformed_nps * 100) - 1.96 * (standard_error * 100) AS lower_bound_ci,
    (mean_transformed_nps * 100) + 1.96 * (standard_error * 100) AS upper_bound_ci
FROM standard_error_calculation;

The results will be:

Transformed NPS: -136
- This indicates that, after applying the custom weights (+2 for Promoters, +1 for Passives, and -3 for Detractors), the overall NPS score is significantly negative, reflecting a majority of Detractors compared to Promoters.
Standard Error: 0.0692
- This represents the uncertainty or variability in the transformed NPS score. A relatively small standard error suggests that the data is not very spread out and the NPS score is stable within the dataset.
95% Confidence Interval:
- Lower Bound: -149.57
- Upper Bound: -122.43
This confidence interval shows that the true value of the transformed NPS is likely to fall between -149.57 and -122.43 with 95% confidence.

Correlation Analysis

We are going to do a pairwise computation of the correlation between traditional NPS scores and the various attributes we have:

Calculate the Pearson correlation between the transformed NPS score and each attribute using the formula:

\text{Correlation}(X, Y) = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2} \times \sqrt{\sum (Y - \bar{Y})^2}}

This gives the pairwise correlation for each attribute against the transformed NPS score. The SQL code will be:

WITH nps_transformation AS (
    SELECT
        -- Assign numerical values to NPS score categories
        CASE
            WHEN nps_score >= 9 THEN 1      -- Promoters (9-10 get +1)
            WHEN nps_score BETWEEN 7 AND 8 THEN 0  -- Passives (7-8 get 0)
            WHEN nps_score <= 6 THEN -1     -- Detractors (0-6 get -1)
        END AS transformed_nps_score,
        purchase_frequency,
        avg_order_value,
        total_spent,
        customer_support_interactions,
        marketing_emails_clicked
    FROM nps_survey_dataset
),
correlation_calculation AS (
    SELECT
        -- Calculate correlation for each attribute using Pearson's formula
        (SUM((transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation)) * (purchase_frequency - (SELECT AVG(purchase_frequency) FROM nps_transformation))) / 
        (SQRT(SUM(POWER(transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation), 2))) * 
         SQRT(SUM(POWER(purchase_frequency - (SELECT AVG(purchase_frequency) FROM nps_transformation), 2))))) AS corr_purchase_frequency,
        
        (SUM((transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation)) * (avg_order_value - (SELECT AVG(avg_order_value) FROM nps_transformation))) / 
        (SQRT(SUM(POWER(transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation), 2))) * 
         SQRT(SUM(POWER(avg_order_value - (SELECT AVG(avg_order_value) FROM nps_transformation), 2))))) AS corr_avg_order_value,

        (SUM((transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation)) * (total_spent - (SELECT AVG(total_spent) FROM nps_transformation))) / 
        (SQRT(SUM(POWER(transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation), 2))) * 
         SQRT(SUM(POWER(total_spent - (SELECT AVG(total_spent) FROM nps_transformation), 2))))) AS corr_total_spent,

        (SUM((transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation)) * (customer_support_interactions - (SELECT AVG(customer_support_interactions) FROM nps_transformation))) / 
        (SQRT(SUM(POWER(transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation), 2))) * 
         SQRT(SUM(POWER(customer_support_interactions - (SELECT AVG(customer_support_interactions) FROM nps_transformation), 2))))) AS corr_customer_support_interactions,

        (SUM((transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation)) * (marketing_emails_clicked - (SELECT AVG(marketing_emails_clicked) FROM nps_transformation))) / 
        (SQRT(SUM(POWER(transformed_nps_score - (SELECT AVG(transformed_nps_score) FROM nps_transformation), 2))) * 
         SQRT(SUM(POWER(marketing_emails_clicked - (SELECT AVG(marketing_emails_clicked) FROM nps_transformation), 2))))) AS corr_marketing_emails_clicked

    FROM nps_transformation
)
SELECT
    corr_purchase_frequency,
    corr_avg_order_value,
    corr_total_spent,
    corr_customer_support_interactions,
    corr_marketing_emails_clicked
FROM correlation_calculation;

The result will be:

The above results are quite depressing:

corr_purchase_frequency: 0.00248 shows a very weak positive correlation between the NPS score and the purchase frequency. This means that customers with higher purchase frequency are very slightly more likely to be Promoters, but the relationship is nearly negligible.
corr_avg_order_value: 0.02237 indicates a weak positive correlation between the average order value and the NPS score. Customers who spend more on average are marginally more likely to be Promoters, but again, the relationship is very weak.
corr_total_spent: 0.0105 suggests that the correlation between total spending and the NPS score is also very weakly positive. This suggests that customers who spend more overall are only slightly more likely to be Promoters.
corr_customer_support_interactions: 0.02748 shows a slightly stronger positive correlation between customer support interactions and the NPS score compared to other attributes, but it is still very weak. Customers who have more interactions with customer support are slightly more likely to give a higher NPS score.
corr_marketing_emails_clicked: -0.0183 indicates a weak negative correlation between the number of marketing emails clicked and the NPS score. Customers who clicked on more marketing emails are marginally less likely to be Promoters or are more likely to be Detractors. However, the relationship is still quite weak.

These weak correlations imply that none of the customer attributes included in the analysis are strong predictors of whether a customer will be a Promoter, Passive, or Detractor.

You may want to explore other customer attributes or use more sophisticated techniques (like feature importance in machine learning models) to better understand what drives NPS.

NPS Prediction

One objective is to determine whether we can use the available customer attributes to predict the NPS score for customers who did not respond to the survey. However, based on the weak correlations observed in the analysis, building a linear regression model (to generate a continuous NPS value) or a classification model (to categorize customers into NPS groups using linear methods) with the current set of attributes would likely result in a model with low predictive power. Here’s why:

The weak correlations suggest that these attributes do not explain much of the variance in the NPS score, indicating that the relationships between the variables and the NPS score are not strong enough for accurate prediction. To improve the model's performance, it may be necessary to create new features or transform existing ones that better capture the relationships between the data and the NPS. For example:

Combining total spent and average order value to create a new feature representing customer value.
Investigating interaction effects between variables, such as combining purchase frequency and customer support interactions to discover hidden patterns.

Furthermore, the weak correlations indicate that the relationships between the attributes and NPS might not be linear. Therefore, using non-linear models such as decision trees, random forests, or gradient boosting machines (GBM) could be more effective. These models are capable of capturing more complex interactions between variables, which could lead to better predictions.

Additionally, the current set of attributes may not provide sufficient information to predict NPS accurately. You may need to incorporate additional or more informative customer attributes that are better predictors of customer satisfaction and loyalty. Attributes such as customer satisfaction scores, net purchase frequency over time, or social media interactions might provide deeper insights into a customer's likelihood of being a Promoter, Passive, or Detractor.

In summary, improving the feature set, exploring non-linear models, and incorporating additional relevant attributes may significantly enhance the model's ability to predict NPS for customers who did not respond to the survey.

Last updated 8 months ago