Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  • Prerequisites
  • Hello World Query
  • Accessing the Standard Field Groups Used in Customer AI
  • Flattening Standard Fields
  • Data Quality Score
  • Completeness
  • Uniqueness
  • Validity
  • Recency, Frequency & Monetary Modeling
  • Survival Analysis Principles and Propensity Modeling
  • A Note on Monetary Value (M)
  • Attribute Assessment for RFM
  • Monetary Value as a Profile Attribute
  • Example Workflow
  • Customer AI Setup
  1. Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING

STATSML 604: Data Exploration for Customer AI in Real-Time Customer Data Platform

Last updated 4 months ago

Prerequisites

Hello World Query

Before we proceed, this is how the demo_data_intelligent_services_demo_midvalues look from an ingestion point of view:

Note that this dataset has not been enabled for Real-Time Customer Profile, meaning it is not being ingested into the profile.

This is how the schema looks:

Run the following on the Experience Event dataset, ensuring it adheres to either the Adobe Analytics Schema or the Consumer Event Schema. Keep in mind that Customer AI automatically generates features using standard field groups.

SELECT * FROM demo_data_intelligent_services_demo_midvalues;

Your result should look something like this:

To get the JSON structure of the output use:

SELECT to_json(STRUCT(*)) AS json_output
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;

Accessing the Standard Field Groups Used in Customer AI

Customer AI uses standard field groups to automatically generate features such as recency, frequency, and engagement metrics without manual intervention. In addition to these standard field groups, custom events can be incorporated for advanced customization, allowing for more tailored insights. While it is not necessary for the data to include all field groups, having relevant ones significantly enhances model performance.

Standard Field Group
Event Types
Common Fields
Description

Commerce

Purchases, Product Views, Checkouts

productListItems.SKU, commerce.order.purchaseID, commerce.purchases.value

Captures transaction-related data for commerce activities.

Web

Web Visits, Page Views, Link Clicks

web.webPageDetails.name, web.webInteraction.linkClicks.value

Tracks website interactions and user behaviors online.

Application

App Installs, Launches, Feature Usage

application.name, application.installs.value, application.featureUsages.value

Focuses on mobile or desktop application interactions.

Search

Search Queries

search.keywords

Logs search behavior and keywords used by customers.

Profile Attributes

Customer Demographics, Preferences

person.name, person.gender, person.birthDate

Provides demographic and user profile information.

Device

Device Details

device.type, device.operatingSystem.name

Identifies devices used by the customer during interactions.

IdentityMap

Identity Resolution

identityMap.ECID.id, identityMap.AAID.id

Links different identifiers for a unified customer view.

Experience Event Metadata

Metadata Tracking

timestamp, channel, environment

Provides contextual metadata about customer events.

You can access the standard fields by executing the following:

SELECT to_json(web) AS web_json
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;
SELECT to_json(productListItems) AS productListItems_json
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;
SELECT to_json(commerce) AS commerce_json
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;
SELECT to_json(application) AS application_json
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;
SELECT to_json(search) AS search_json
FROM demo_data_intelligent_services_demo_midvalues
LIMIT 1;

Flattening Standard Fields

First try running something like this, a template that has all the fields:

SELECT
    -- Web Interaction Details
    web.webPageDetails.name AS page_name,
    web.webInteraction.linkClicks.value AS link_clicks,

    -- Commerce Details
    commerce.purchases.value AS purchase_value,
    commerce.order.purchaseID AS purchase_id,
    commerce.checkouts.value AS checkout_value,
    commerce.productListViews.value AS product_list_views,
    commerce.productListOpens.value AS product_list_opens,
    commerce.productListRemovals.value AS product_list_removals,
    commerce.productViews.value AS product_views,
    productListItems.SKU AS product_sku,

    -- Application Details
    application.name AS application_name,
    application.applicationCloses.value AS app_closes,
    application.crashes.value AS app_crashes,
    application.featureUsages.value AS feature_usages,
    application.firstLaunches.value AS first_launches,

    -- Search Information
    search.keywords AS search_keywords,

    -- Event Metadata
    meta.intendedToExtend AS intended_to_extend,

    -- Time Period
    startDate,
    endDate

FROM
    demo_data_intelligent_services_demo_midvalues;

In my case, I will get an error that says:

The error message:

ErrorCode: 42601 ... no viable alternative at input 'commerce.order'

This suggests that commerce.order doesn't exist. The key part to notice is no viable alternative at input 'commerce.order'

Another error message that you will get which is also indicative of the same error is:

ErrorCode: 08P01 queryId: 83370942-ffd7-4aa3-9f54-22b1edd06c56 Unknown error encountered. Reason: [[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `meta`.`intendedtoextend` cannot be resolved. Did you mean one of the following? [`demo_data_intelligent_services_demo_midvalues`.`_id`, `demo_data_intelligent_services_demo_midvalues`.`web`, `demo_data_intelligent_services_demo_midvalues`.`media`, `demo_data_intelligent_services_demo_midvalues`.`device`, `demo_data_intelligent_services_demo_midvalues`.`search`].; line 12 pos 2; 'GlobalLimit 50000 +- 'LocalLimit 50000 +- 'Project [web#503298.webpagedetails

This pattern will repeat for each missing column if you keep removing or commenting them manually.

Let us execute the query after commenting out the missing columns:

SELECT
    -- Web Interaction Details
    web.webPageDetails.name AS page_name,
    web.webInteraction.linkClicks.value AS link_clicks,

    -- Commerce Details
    commerce.purchases.value AS purchase_value,
    -- commerce.order.purchaseID AS purchase_id,  -- COMMENTED OUT (missing column)
    commerce.checkouts.value AS checkout_value,
    commerce.productListViews.value AS product_list_views,
    commerce.productListOpens.value AS product_list_opens,
    commerce.productListRemovals.value AS product_list_removals,
    commerce.productViews.value AS product_views,
    productListItems.SKU AS product_sku,

    -- Application Details
    -- application.name AS application_name,
    -- application.applicationCloses.value AS app_closes,
    -- application.crashes.value AS app_crashes,
    -- application.featureUsages.value AS feature_usages,
    -- application.firstLaunches.value AS first_launches,

    -- Search Information
    search.keywords AS search_keywords

    -- Event Metadata
    -- meta.intendedToExtend AS intended_to_extend,

    -- Time Period
    -- startDate,
    -- endDate

FROM demo_data_intelligent_services_demo_midvalues;

Observe that the comma after search_keywords was removed as it is the last column.

Data Quality Score

The Data Quality Score (DQS) is a composite metric designed to measure how reliable, complete, and consistent data is within a dataset. The goal is to quantify data quality so that issues can be identified and improvements can be tracked over time.

We evaluated data quality based on three core dimensions:

Completeness

  • The proportion of non-null (non-missing) values in the dataset. Missing data can skew analyses, leading to biased insights.

  • Formula:

Completeness (%)=(1−Null ValuesTotal Records)×100\text{Completeness (\%)} = \left(1 - \frac{\text{Null Values}}{\text{Total Records}}\right) \times 100Completeness (%)=(1−Total RecordsNull Values​)×100

Uniqueness

  • The proportion of distinct (unique) values relative to the total number of records. Ensures data is free from duplicates, which can distort aggregations or counts.

  • Formula:

Uniqueness (%)=Distinct ValuesTotal Records×100\text{Uniqueness (\%)} = \frac{\text{Distinct Values}}{\text{Total Records}} \times 100Uniqueness (%)=Total RecordsDistinct Values​×100

Validity

  • Measures if the data conforms to expected formats, ranges, or patterns. Invalid data (e.g., negative prices, malformed dates) can break business rules.

  • Formula:

Validity (%)=Valid RecordsTotal Records×100\text{Validity (\%)} = \frac{\text{Valid Records}}{\text{Total Records}} \times 100 Validity (%)=Total RecordsValid Records​×100

We average the data quality metrics to provide a balanced view, ensuring that no single metric dominates the overall score unless explicitly weighted. This approach maintains fairness across different dimensions of data quality. However, flexible weighting can be applied when necessary. In certain contexts, such as financial data, specific dimensions like validity might carry more weight due to their critical importance in ensuring data accuracy and compliance.

Here is the query that you should execute:

WITH data_quality AS (
    SELECT
        -- Web Interaction Details (Completeness & Uniqueness)
        (1 - (COUNT(CASE WHEN web.webPageDetails.name IS NULL THEN 1 END) / COUNT(*))) * 100 AS page_name_completeness,
        (COUNT(DISTINCT web.webPageDetails.name) / COUNT(*)) * 100 AS page_name_uniqueness,

        (1 - (COUNT(CASE WHEN web.webInteraction.linkClicks.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS link_clicks_completeness,
        (COUNT(DISTINCT web.webInteraction.linkClicks.value) / COUNT(*)) * 100 AS link_clicks_uniqueness,

        -- Commerce Details (Completeness, Uniqueness, Validity)
        (1 - (COUNT(CASE WHEN commerce.purchases.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS purchase_value_completeness,
        (COUNT(DISTINCT commerce.purchases.value) / COUNT(*)) * 100 AS purchase_value_uniqueness,
        (COUNT(CASE WHEN commerce.purchases.value >= 0 THEN 1 END) / COUNT(*)) * 100 AS purchase_value_validity,

        -- Commented Section for commerce.order
        -- (1 - (COUNT(CASE WHEN commerce.order.purchaseID IS NULL THEN 1 END) / COUNT(*))) * 100 AS purchase_id_completeness,
        -- (COUNT(DISTINCT commerce.order.purchaseID) / COUNT(*)) * 100 AS purchase_id_uniqueness,

        (1 - (COUNT(CASE WHEN commerce.checkouts.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS checkout_value_completeness,
        (COUNT(DISTINCT commerce.checkouts.value) / COUNT(*)) * 100 AS checkout_value_uniqueness,

        (1 - (COUNT(CASE WHEN commerce.productListViews.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS product_list_views_completeness,
        (COUNT(DISTINCT commerce.productListViews.value) / COUNT(*)) * 100 AS product_list_views_uniqueness,

        (1 - (COUNT(CASE WHEN commerce.productListOpens.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS product_list_opens_completeness,
        (COUNT(DISTINCT commerce.productListOpens.value) / COUNT(*)) * 100 AS product_list_opens_uniqueness,

        (1 - (COUNT(CASE WHEN commerce.productListRemovals.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS product_list_removals_completeness,
        (COUNT(DISTINCT commerce.productListRemovals.value) / COUNT(*)) * 100 AS product_list_removals_uniqueness,

        (1 - (COUNT(CASE WHEN commerce.productViews.value IS NULL THEN 1 END) / COUNT(*))) * 100 AS product_views_completeness,
        (COUNT(DISTINCT commerce.productViews.value) / COUNT(*)) * 100 AS product_views_uniqueness,

        (1 - (COUNT(CASE WHEN productListItems.SKU IS NULL THEN 1 END) / COUNT(*))) * 100 AS product_sku_completeness,
        (COUNT(DISTINCT productListItems.SKU) / COUNT(*)) * 100 AS product_sku_uniqueness,
        (COUNT(CASE WHEN SIZE(productListItems.SKU) > 0 THEN 1 END) / COUNT(*)) * 100 AS product_sku_validity,

        -- Search Information
        (1 - (COUNT(CASE WHEN search.keywords IS NULL THEN 1 END) / COUNT(*))) * 100 AS search_keywords_completeness,
        (COUNT(DISTINCT search.keywords) / COUNT(*)) * 100 AS search_keywords_uniqueness
    FROM demo_data_intelligent_services_demo_midvalues
)

SELECT 'page_name' AS column_name, (page_name_completeness + page_name_uniqueness) / 2 AS data_quality_score FROM data_quality
UNION ALL
SELECT 'link_clicks', (link_clicks_completeness + link_clicks_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'purchase_value', (purchase_value_completeness + purchase_value_uniqueness + purchase_value_validity) / 3 FROM data_quality
UNION ALL
SELECT 'checkout_value', (checkout_value_completeness + checkout_value_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'product_list_views', (product_list_views_completeness + product_list_views_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'product_list_opens', (product_list_opens_completeness + product_list_opens_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'product_list_removals', (product_list_removals_completeness + product_list_removals_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'product_views', (product_views_completeness + product_views_uniqueness) / 2 FROM data_quality
UNION ALL
SELECT 'product_sku', (product_sku_completeness + product_sku_uniqueness + product_sku_validity) / 3 FROM data_quality
UNION ALL
SELECT 'search_keywords', (search_keywords_completeness + search_keywords_uniqueness) / 2 FROM data_quality;

The results are:

Recency, Frequency & Monetary Modeling

RFM modeling is a powerful customer segmentation technique that helps businesses understand and predict customer behavior based on three key metrics:

  • Recency (R): How recently a customer performed an action (e.g., last purchase, last visit).

  • Frequency (F): How often the customer performs the action within a specific timeframe.

  • Monetary (M): How much the customer has spent over a period of time.

In traditional marketing and customer analytics, these metrics help identify high-value customers, predict churn, and personalize marketing strategies.

Survival Analysis Principles and Propensity Modeling

In Customer AI, we're tasked with predicting the propensity of an event occurring within the next N days, such as a customer making a purchase or engaging with a product. At first glance, this might seem like a straightforward classification problem—did the customer convert or not? However, the underlying mechanics of how we compute this propensity are deeply influenced by survival analysis principles, even if we're not explicitly running survival models.

Survival analysis is fundamentally about estimating the probability that an event has not occurred yet by a certain time, represented by the survival function S(t). In the context of Customer AI, when we calculate the propensity to convert in the next N days, we're essentially working with 1−S(N)—the probability that the customer will convert within that time frame. This is where the illusion comes into play: while we might not explicitly model S(t), the features we engineer, such as Recency (R) and Frequency (F), are designed to behave as proxies that capture the dynamics of time-to-event data, just like survival analysis would.

Recency (R) acts as an implicit measure of the time since the last event, closely tied to the hazard function h(t) in survival analysis, which represents the instantaneous risk of an event occurring at time t. The more recent the engagement, the higher the implied hazard or conversion risk. Similarly, Frequency (F) reflects the accumulated risk over time, akin to the cumulative hazard function H(t). Customers with frequent engagements are treated as having a higher cumulative risk of conversion because their repeated actions signal strong intent.

By feeding R and F into machine learning models like XGBoost, we are essentially embedding these survival-based risk factors into the model’s decision-making process. The model learns to associate recent, frequent behaviors with higher propensities to convert, mimicking the effects of survival functions without explicitly modeling them. This approach allows us to handle large-scale behavioral data efficiently while still leveraging the time-sensitive nature of customer actions, which is the core strength of survival analysis. In essence, we’re creating an illusion of survival modeling—using its principles to shape our features and predictions, even though we’re technically solving a classification problem.

A Note on Monetary Value (M)

While Monetary (M) is a critical component of traditional RFM (Recency, Frequency, Monetary) modeling, it is not used natively in Customer AI. This is because Customer AI is designed to predict future customer behavior, such as conversions or churn, with a strong emphasis on engagement patterns rather than historical spending. Behavioral signals like Recency (R) and Frequency (F) are more dynamic and time-sensitive, making them better aligned with predictive models that rely on survival analysis principles. Additionally, monetary data often suffers from inconsistency across platforms, especially when customers engage through multiple channels, making it less reliable for direct inclusion in propensity models.

However, if businesses wish to incorporate Monetary (M) into Customer AI for advanced segmentation, it can be added as a Profile Attribute. This approach is particularly useful for use cases like lifetime value (LTV) prediction or revenue-based customer segmentation, where understanding the financial impact of customer behavior is critical. By complementing the existing propensity models with monetary data, organizations can gain deeper insights into not just who is likely to convert, but also which customers are likely to bring the most value. This dual-layer analysis helps in optimizing marketing strategies, resource allocation, and personalized customer engagement.

Attribute Assessment for RFM

A look at the table shows you which of the attributes are suitable or not suitable

Field
Data Quality Score (%)
Recency Suitability
Frequency Suitability
Monetary Suitability

page_name

39.25

✅ Strong (Tracks last page viewed)

✅ Strong (Counts page visits)

❌ Not Applicable

product_sku

38.29

✅ Strong (Last product interaction)

✅ Strong (Product interaction counts)

❌ Not Applicable

product_views

3.22

⚠️ Moderate (Recent view may be missing)

⚠️ Moderate (Some views might be missed)

❌ Not Applicable

link_clicks

3.12

⚠️ Moderate (Recent clicks tracked inconsistently)

⚠️ Moderate (Click counts may be incomplete)

❌ Not Applicable

product_list_views

1.94

❌ Weak (Incomplete last view tracking)

❌ Weak (Inconsistent counts)

❌ Not Applicable

product_list_opens

1.82

❌ Weak (Missing last open data)

❌ Weak (Sparse event tracking)

❌ Not Applicable

checkout_value

1.22

❌ Poor (Sparse events, weak recency)

❌ Poor (Low frequency, missing events)

✅ Applicable (Tracks transaction amounts)

search_keywords

1.09

❌ Weak (Limited search tracking)

❌ Weak (Few search event records)

❌ Not Applicable

purchase_value

0.54

❌ Very Poor (Critical purchase data missing)

❌ Very Poor (Few transactions captured)

✅ Applicable (Key for monetary analysis)

product_list_removals

0.11

❌ Extremely Poor (Unreliable recency data)

❌ Extremely Poor (Event counts unreliable)

❌ Not Applicable

Monetary Value as a Profile Attribute

To calculate the Monetary (M) value and add it to the Profile, we do the following

Based on the JSON structure:

  • commerce.productListViews.value

  • commerce.productListRemovals.value

  • commerce.order.purchaseID

  • productListItems.SKU

For Monetary (M), we will consider the commerce.order section, focusing on:

  • purchaseID (to identify transactions)

  • productListItems.SKU (to track purchased items)

  • commerce.purchases.value (if available) or aggregate values from transactions.

-- Step 1: Extract relevant transaction data
CREATE OR REPLACE VIEW order_data AS
SELECT 
    identityMap.ECID.id AS ecid,
    commerce.order.purchaseID AS purchase_id,
    productListItems.SKU AS sku,
    commerce.purchases.value AS purchase_value,
    TO_DATE(_acp_system_metadata.timestamp) AS purchase_date
FROM demo_data_intelligent_services_demo_midvalues
WHERE commerce.order.purchaseID IS NOT NULL;

-- Step 2: Aggregate the total monetary value per user
CREATE OR REPLACE VIEW monetary_aggregation AS
SELECT 
    ecid,
    SUM(CASE 
            WHEN purchase_value IS NOT NULL THEN purchase_value 
            ELSE 0 
        END) AS total_monetary_value
FROM order_data
GROUP BY ecid;

-- Step 3: Create the profile table to store monetary value
CREATE TABLE IF NOT EXISTS adls_profile_monetary (
    ecid TEXT PRIMARY IDENTITY NAMESPACE 'ECID',
    total_monetary_value DECIMAL(18, 2)
)
WITH (LABEL = 'PROFILE');

-- Step 4: Insert aggregated data into the profile table
INSERT INTO adls_profile_monetary
SELECT 
    STRUCT(
        ecid,
        total_monetary_value
    ) AS profile_data
FROM monetary_aggregation; 

Example Workflow

Download the following file:

by following the steps here:

-- Step 1: Extract relevant transaction data
CREATE OR REPLACE VIEW order_data AS
SELECT 
    ECID AS ecid,
    purchaseID AS purchase_id,
    SKU AS sku,
    purchase_value AS purchase_value,
    TO_DATE(timestamp) AS purchase_date
FROM commerce_data
WHERE purchaseID IS NOT NULL;

-- Step 2: Aggregate the total monetary value per user
CREATE OR REPLACE VIEW monetary_aggregation AS
SELECT 
    ecid,
    SUM(CASE 
            WHEN purchase_value IS NOT NULL THEN purchase_value 
            ELSE 0 
        END) AS total_monetary_value
FROM order_data
GROUP BY ecid;

-- Step 3: Create the profile table to store monetary value
CREATE TABLE IF NOT EXISTS adls_profile_monetary (
    ecid TEXT PRIMARY IDENTITY NAMESPACE 'ECID',
    total_monetary_value DECIMAL(18, 2)
)
WITH (LABEL = 'PROFILE');

-- Step 4: Insert aggregated data into the profile table
INSERT INTO adls_profile_monetary
SELECT 
    STRUCT(
        ecid,
        total_monetary_value
    ) AS profile_data
FROM monetary_aggregation;

In Step 3, the SQL code creates a table named adls_profile_monetary to store the aggregated monetary values for each customer. The ecid (Experience Cloud ID) serves as the primary identifier, ensuring each customer's data remains unique within the 'ECID' namespace. This is critical where identity resolution and profile unification rely on consistent identifiers. The total_monetary_value column captures the cumulative spending of each customer, formatted as a decimal to handle currency values accurately. The WITH (LABEL = 'PROFILE') clause designates the table as part of the Real-Time Customer Profile, enabling seamless integration with audience segmentation, personalization, and activation workflows.

In Step 4, the aggregated data from the monetary_aggregation view is inserted into the newly created profile table. The STRUCT function packages the ecid and its corresponding total_monetary_value into a structured format compatible with profile-based systems. This approach ensures that monetary values are not just stored but are readily available for real-time analytics and targeting. By centralizing this data at the profile level, marketers can effortlessly identify high-value customers, create personalized offers, and drive data-driven marketing strategies based on customers' historical spending behavior.

Customer AI Setup

PREP 500: Ingesting CSV Data into Adobe Experience Platform
EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
EXPLORE 201: Exploring Web Analytics Data with Data Distiller
ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
89KB
commerce_data.csv
Experience event dataset
Schema has standard field groups that resemble those in Adobe Analytics schema. Some of these standard field groups will be used by Customer AI.
A simple SELECT query does not reveal much.
JSON output
commerce fieldgroup details
Last error code that says no viable alternative at input
There could be data quality issues that we need to investigate
Data Quality Score of the Fields
Page cover image