[DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
Step-by-Step Tracking of Changes:
Daily Snapshot of Data: Every day, you take a snapshot of customer attributes, segment memberships, and identity maps to track changes. You can compute the CRC32 checksum for each table to detect changes.
Tracking Changes in Customer Attributes:
You can create a query to compare the current day’s data with the previous day’s data and identify any changes in customer attributes:
SELECT a.customer_id, CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status)) != CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status)) THEN 'Changed' ELSE 'No Change' END AS attribute_change_status FROM customer_attributes a LEFT JOIN customer_attributes b ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1) WHERE a.date_recorded = CURRENT_DATE();
This query compares the CRC32 checksum of the concatenated customer attributes for each customer and checks for differences between today’s and yesterday’s data.
Tracking Changes in Segment Membership:
Similarly, you can track changes in segment membership by comparing the Boolean flags of different segments:
SELECT a.customer_id, CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo)) != CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo)) THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status FROM customer_segments a LEFT JOIN customer_segments b ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1) WHERE a.date_recorded = CURRENT_DATE();
This query tracks changes in segment membership, comparing today’s segment flags to the previous day.
Tracking Changes in Identity Map:
Finally, to track changes in the identity map (email, phone, and loyalty ID):
SELECT a.customer_id, CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id)) != CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id)) THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status FROM identity_map a LEFT JOIN identity_map b ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1) WHERE a.date_recorded = CURRENT_DATE();
This query compares the identity information between today and the previous day, tracking any updates in the identity map.
Summary Table of Changes:
You can combine these queries into a single view or summary table that shows the daily changes per customer, across attributes, segments, and identity mappings:
SELECT
a.customer_id,
attr_changes.attribute_change_status,
seg_changes.segment_change_status,
id_changes.identity_change_status
FROM customer_attributes a
JOIN (
SELECT
customer_id,
CASE WHEN CRC32(CONCAT(a.first_name, a.last_name, a.email, a.phone_number, a.age, a.location, a.purchase_history, a.loyalty_status))
!= CRC32(CONCAT(b.first_name, b.last_name, b.email, b.phone_number, b.age, b.location, b.purchase_history, b.loyalty_status))
THEN 'Changed' ELSE 'No Change' END AS attribute_change_status
FROM customer_attributes a
LEFT JOIN customer_attributes b
ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
WHERE a.date_recorded = CURRENT_DATE()
) attr_changes ON a.customer_id = attr_changes.customer_id
JOIN (
SELECT
customer_id,
CASE WHEN CRC32(CONCAT(a.high_spenders, a.frequent_buyers, a.location_based_promo))
!= CRC32(CONCAT(b.high_spenders, b.frequent_buyers, b.location_based_promo))
THEN 'Segment Changed' ELSE 'No Change' END AS segment_change_status
FROM customer_segments a
LEFT JOIN customer_segments b
ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
WHERE a.date_recorded = CURRENT_DATE()
) seg_changes ON a.customer_id = seg_changes.customer_id
JOIN (
SELECT
customer_id,
CASE WHEN CRC32(CONCAT(a.email_id, a.phone_id, a.loyalty_id))
!= CRC32(CONCAT(b.email_id, b.phone_id, b.loyalty_id))
THEN 'Identity Changed' ELSE 'No Change' END AS identity_change_status
FROM identity_map a
LEFT JOIN identity_map b
ON a.customer_id = b.customer_id AND b.date_recorded = DATE_SUB(a.date_recorded, 1)
WHERE a.date_recorded = CURRENT_DATE()
) id_changes ON a.customer_id = id_changes.customer_id;
Outcome:
This combined query tracks changes in customer attributes, segment memberships, and identity map on a daily basis. You can use this for reporting, auditing customer data, or triggering specific actions (like personalized marketing updates) based on the detected changes.
Last updated