Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  • Adobe Experience Platform Primer
  • Data Distiller Primer
  • Why use Derived Datasets?
  • Why use Data Distiller Customizable Insights Dashboards?
  1. UNIT 1: GETTING STARTED

PREP 300: Adobe Experience Platform & Data Distiller Primers

Last updated 7 months ago

Adobe Experience Platform Primer

In this section, we'll delve into the fundamental concepts of Adobe Experience Platform. Data enters the Adobe Experience Platform through edge, streaming, and batch methods. Regardless of the ingestion mode, all data must find its place in a dataset within the Platform Data Lake. The ingestible data falls into three categories: attributes, events, and lookups. The Real-Time Customer Profile operates with two concurrent stores – a Profile Store and an Identity Store. The Profile Store takes in and partitions data based on the storage key, which is the primary identity. Meanwhile, the Identity Store continuously seeks associations among identities, including the primary one, within the ingested record, utilizing this information to construct the identity graph. These two stores, accessing historical data in the Platform Data Lake, open avenues for advanced modeling, such as RFM, among other techniques.

Data Distiller Primer

Adobe Experience Platform excels in ingesting data from diverse sources. However, marketers face a significant challenge in extracting actionable insights to enhance their understanding of customers. Data Distiller addresses this challenge by providing the flexibility to query data using standard SQL in the Query Editor.

A valuable addition to this capability is the Data Distiller package, which encompasses a subset of functionalities available in Adobe Experience Platform. Specifically designed to facilitate post-ingestion data preparation, Data Distiller tackles key tasks such as cleaning, shaping, and manipulation. It executes batch query in the Query Service, preparing data for use in Real-Time Customer Profile and other applications.

Utilizing Data Distiller, you gain the capability to join any dataset within the data lake and capture query results as a new dataset. This newly created dataset proves versatile, serving various purposes such as reporting, machine learning, or ingestion into Adobe Experience Platform-based applications like Real-Time Customer Profile Data, Adobe Journey Optimizer, and Customer Journey Analytics.

There are three primary use cases for Data Distiller, and this continues to expand every few releases:

Next, let us get familiar with a few key terms, which will be used throughout this book

  • Adobe Experience Platform: This is the shorthand for Adobe Experience Platform.

  • Adobe Experience Platform Data Lake: This denotes the data lake store housed within the Adobe Experience Platform governance boundary. Irrespective of the ingestion mode, all data is directed to the Adobe Experience Platform Data Lake. Currently, Data Distiller interacts with this lake, both reading and writing datasets. Additionally, Data Distiller possesses its own accelerated store designed for business intelligence reporting, allowing seamless reading, and writing of datasets. The Adobe Experience Platform Data Lake contains datasets which can be either attributes, events, or lookups. Each of these datasets must have an associated schema with them.

  • Query Service: This is a broad set of SQL capabilities in the Adobe Experience Platform. Some of these capabilities may be included in the packaging of various Apps such as Adobe Journey Optimizer but most of it is packaged in Data Distiller. It is referred to as a service as the entire foundation is built on service-oriented architecture.

  • Derived Attributes: In Data Distiller, derived attributes are calculated or derived from other attributes within a dataset or table, and they are stored in a customized dataset called as a Derived Dataset. These attributes are computed using expressions or mathematical functions applied to existing attributes or events within the same table or through joins with other tables. For example, calculating the Customer Lifetime Value (CLTV) based on the last 5 years of transactions for each customer.

  • Audiences: Audiences are constructed on top of attributes, events and derived attributes which include logic for metrics such as Customer Lifetime Value (CLTV) or the count of transactions. Audiences can encompass 1st, 2nd, or 3rd party data and may combine data from multiple sources associated with the same person.

  • Ad hoc queries: Ad hoc queries refer to SQL queries utilized for exploring ingested datasets, primarily for verification, validation, experimentation, etc. These queries, crucially, DO NOT write data back into the Adobe Experience Platform Data Lake.

  • Batch queries: Batch queries are SQL queries employed for post-ingestion processing of ingested datasets. These queries undertake tasks like cleaning, shaping, manipulating, and enriching data, with the results written back to the Platform data lake. Batch queries can be scheduled, managed, and monitored as batch jobs.

  • Accelerated Store: SQL queries executed against this reporting layer support interactive dashboards and BI workflows. The results are cached for faster response time. Within the Data Distiller offering, customers can utilize an accelerated store to create insights data models efficiently, including the one employed for RFM analysis in this lab. Directly within our user interface, users can employ a lightweight BI-type dashboard to visualize key performance indicators (KPIs). Additionally, there is the option to seamlessly connect external BI tools, such as Power BI, enhancing flexibility in data visualization and analysis.

  • Derived Datasets: The Derived Datasets feature can be leveraged for cleaning, shaping, and manipulating specific data from the Adobe Experience Platform Data Lake to generate custom datasets. These datasets can be regularly refreshed at cadence to enrich the Real-Time Customer Profile. By leveraging derived datasets, you can create complex calculations with distributions such as deciles, percentiles, or quartiles or simpler ones such as maximum value, counts, and mean value. These datasets can be tailored to individual users or business entities, associating directly with identities such as email addresses, device IDs, and phone numbers, or indirectly with user or business profiles.

Why use Derived Datasets?

Derived Datasets play a crucial role in various data analysis and enrichment scenarios, especially when analyzing data on the Adobe Experience Platform Data Lake. Furthermore, they can be marked for use in the Real-Time Customer Profile and applied in downstream use cases such as audience targeting. Potential use cases include:

  • Identifying the bottom 10% of subscribers based on channel viewership to target specific audience segments for new subscription packages.

  • Identifying top 10% flyers based on total miles traveled and "Flyer" status to target them for new credit card offers.

  • Analyzing subscription churn rates.

  • Identifying the top 1% of household income in a region and tracking the number of individuals moving out of that income bracket over a specified period.

Why use Data Distiller Customizable Insights Dashboards?

Dashboards provide a dynamic and interactive way to review RFM (Recency, Frequency, Monetization) marketing analysis, offering insights and trends at a glance. This approach enables businesses to quickly identify valuable customer audiences and adjust their marketing strategies, accordingly, maximizing both engagement and ROI.

Basic Architecture of the Adobe Experience Platform.
Figure 1: Data Distiller Use Cases Marketecture Diagram.
Page cover image