Page cover image

[DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data

Step-by-Step Tracking of Changes:

  1. Daily Snapshot of Data: Every day, you take a snapshot of customer attributes, segment memberships, and identity maps to track changes. You can compute the CRC32 checksum for each table to detect changes.

  2. Tracking Changes in Customer Attributes:

    You can create a query to compare the current day’s data with the previous day’s data and identify any changes in customer attributes:

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status)) 
           != CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status)) 
           THEN 'Changed' ELSE 'No Change' END AS attribute_change_status
    FROM customer_attributes a
    LEFT JOIN customer_attributes b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query compares the CRC32 checksum of the concatenated customer attributes for each customer and checks for differences between today’s and yesterday’s data.

  3. Tracking Changes in Segment Membership:

    Similarly, you can track changes in segment membership by comparing the Boolean flags of different segments:

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo)) 
           != CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo)) 
           THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status
    FROM customer_segments a
    LEFT JOIN customer_segments b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query tracks changes in segment membership, comparing today’s segment flags to the previous day.

  4. Tracking Changes in Identity Map:

    Finally, to track changes in the identity map (email, phone, and loyalty ID):

    SELECT 
      a.customer_id,
      CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id)) 
           != CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id)) 
           THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status
    FROM identity_map a
    LEFT JOIN identity_map b
    ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
    WHERE a.date_recorded = CURRENT_DATE();

    This query compares the identity information between today and the previous day, tracking any updates in the identity map.

Summary Table of Changes:

You can combine these queries into a single view or summary table that shows the daily changes per customer, across attributes, segments, and identity mappings:

SELECT 
  a.customer_id,
  attr_changes.attribute_change_status,
  seg_changes.segment_change_status,
  id_changes.identity_change_status
FROM customer_attributes a
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status)) 
         != CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status)) 
         THEN 'Changed' ELSE 'No Change' END AS attribute_change_status
  FROM customer_attributes a
  LEFT JOIN customer_attributes b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) attr_changes ON a.customer_id = attr_changes.customer_id
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo)) 
         != CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo)) 
         THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status
  FROM customer_segments a
  LEFT JOIN customer_segments b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) seg_changes ON a.customer_id = seg_changes.customer_id
JOIN (
  SELECT 
    customer_id,
    CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id)) 
         != CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id)) 
         THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status
  FROM identity_map a
  LEFT JOIN identity_map b
  ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
  WHERE a.date_recorded = CURRENT_DATE()
) id_changes ON a.customer_id = id_changes.customer_id;

Outcome:

This combined query tracks changes in customer attributes, segment memberships, and identity map on a daily basis. You can use this for reporting, auditing customer data, or triggering specific actions (like personalized marketing updates) based on the detected changes.

Last updated