Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  1. UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)

[DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data

Step-by-Step Tracking of Changes:

  1. Daily Snapshot of Data: Every day, you take a snapshot of customer attributes, segment memberships, and identity maps to track changes. You can compute the CRC32 checksum for each table to detect changes.

  2. Tracking Changes in Customer Attributes:

    You can create a query to compare the current day’s data with the previous day’s data and identify any changes in customer attributes:

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status)) 
           != CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status)) 
           THEN 'Changed' ELSE 'No Change' END AS attribute_change_status
    FROM customer_attributes a
    LEFT JOIN customer_attributes b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query compares the CRC32 checksum of the concatenated customer attributes for each customer and checks for differences between today’s and yesterday’s data.

  3. Tracking Changes in Segment Membership:

    Similarly, you can track changes in segment membership by comparing the Boolean flags of different segments:

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo)) 
           != CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo)) 
           THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status
    FROM customer_segments a
    LEFT JOIN customer_segments b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query tracks changes in segment membership, comparing today’s segment flags to the previous day.

  4. Tracking Changes in Identity Map:

    Finally, to track changes in the identity map (email, phone, and loyalty ID):

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id)) 
           != CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id)) 
           THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status
    FROM identity_map a
    LEFT JOIN identity_map b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query compares the identity information between today and the previous day, tracking any updates in the identity map.

Summary Table of Changes:

You can combine these queries into a single view or summary table that shows the daily changes per customer, across attributes, segments, and identity mappings:

SELECT 
  a.customer_id,
  attr_changes.attribute_change_status,
  seg_changes.segment_change_status,
  id_changes.identity_change_status
FROM customer_attributes a
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status)) 
         != CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status)) 
         THEN 'Changed' ELSE 'No Change' END AS attribute_change_status
  FROM customer_attributes a
  LEFT JOIN customer_attributes b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) attr_changes ON a.customer_id = attr_changes.customer_id
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo)) 
         != CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo)) 
         THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status
  FROM customer_segments a
  LEFT JOIN customer_segments b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) seg_changes ON a.customer_id = seg_changes.customer_id
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id)) 
         != CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id)) 
         THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status
  FROM identity_map a
  LEFT JOIN identity_map b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) id_changes ON a.customer_id = id_changes.customer_id;

Outcome:

This combined query tracks changes in customer attributes, segment memberships, and identity map on a daily basis. You can use this for reporting, auditing customer data, or triggering specific actions (like personalized marketing updates) based on the detected changes.

Last updated 7 months ago

Page cover image