Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  • Introduction
  • Prerequisites
  • Key Concepts
  • But Why Do We Need Data Models
  • I Do Not Care About Data Models. What Are My Options?
  1. Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE

[DRAFT]BI 101: What is a Data Distiller Data Model?

Last updated 8 months ago

Introduction

A lot of users ask me this question on what a database, a schema and a data model mean when using Data Distiller. I am aiming to demystify some of these concepts.

A data model is all but a schema (star, snowflake, anything you can imagine) that establishes lookup relationships (mostly lookups) between fields across various datasets. Databases contain tables but when there are schemas, we place them right underneath the database. This combination of a database and a schema is given a special name called a "data model" in the world of Data Distiller. If you are building a guided mode dashboard on a table in Adobe Experience Platform UI, Data Distiller data models autodetect the relationships with other tables and do the lookups for you.

In the future, Data Distiller data models will go beyond just logical partitioning of your insights data and easy chart authoring to support security access at this construct.

Prerequisites

You need to have a basic understanding of the Data Distiller architecture specifically the access patterns of the 3 query engines and the 2 data stores.

Key Concepts

  1. Database: Think of a database fundamentally as a collection of tables and other associated code.

    • Tables: These are the foundational units of storage. Data Distiller operates across two stores:

      • Data Lake: The tables can be relational or relational i.e. JSON-like semi-structured

      • Accelerated Store: The tables can only be relational as they are supporting reporting or "lookup a value" style use cases.

      • Views: Think of views as a virtual tables that is based on the result set of a query. Remember a VIEW functional behaves like a table but there is no data in it. It is all but a query. Views are great because you do not have to incur the cost of materializing the table. Views are not great when the result set is so large that your query times out i.e. you are better off materializing the results.

Remember, the ability of a query engine to query tables across databases depends on whether the query engine supports it.

Pro Tip: In Data Distiller, you can write federated queries against tables in databases across both Data Lake and the Accelerated Store. This means that I can use tables from thesee two stores in a single query.

  1. Schema: Before you add these tables or views to the database, you may want to logically arrange them. Think of schemas as containers for tables sitting right underneath the database.

Remember, the ability of a query engine to query tables across schemas depends on whether the query engine supports it. As mentioned above, in Data Distiller, you can write federated queries against tables in databases across both Data Lake and the Accelerated Store. It does not matter what schema and database was used.

  1. Data Model: The collection of tables under each schema under a database is called a data model. In essence, it is a collection of tables with an address of <Database Name.Table Name>

Think of another analogy of a data model is the following: Think of a neighborhood as the database. Each lane is a schema. Each of the houses on that lane are the tables. Remember, there could he houses that look exactly the same in different neighborhoods or even a different lane in the same neighborhood. I need a way to differentiate a house within the context of a neighborhood and a lane. When I say data model, I am asking for an address of the lane which is <Neighborhood, Lane Name>. To get to a house, I need <Neighborhood, Lane Name, House Name>.

But Why Do We Need Data Models

The simplest answer: For the exact same reasons as to why we need drives and folders on our computers. The idea is to make sure you are

I Do Not Care About Data Models. What Are My Options?

Your use case could be: I like clutter and my use cases do not need the organization that is being talked about here.

There are two key data models that are

PREP 302: Key Topics Overview: Architecture, MDM, Personas
Page cover image