[DRAFT]Insights 101: What is a Data Distiller Data Model?
Introduction
A lot of users ask me this question on what a database, a schema and a data model mean when using Data Distiller. I am aiming to demystify some of these concepts.
A data model is all but a schema (star, snowflake, anything you can imagine) that establishes lookup relationships (mostly lookups) between fields across various datasets. Databases contain tables but when there are schemas, we place them right underneath the database. This combination of a database and a schema is given a special name called a "data model" in the world of Data Distiller. If you are building a guided mode dashboard on a table in Adobe Experience Platform UI, Data Distiller data models autodetect the relationships with other tables and do the lookups for you.
In the future, Data Distiller data models will go beyond just logical partitioning of your insights data and easy chart authoring to support security access at this construct.
Prerequisites
You need to have a basic understanding of the Data Distiller architecture specifically the access patterns of the 3 query engines and the 2 data stores.
Key Concepts
Database: Think of a database fundamentally as a collection of tables and other associated code.
Tables: These are the foundational units of storage. Data Distiller operates across two stores:
Data Lake: The tables can be relational or relational i.e. JSON-like semi-structured
Accelerated Store: The tables can only be relational as they are supporting reporting or "lookup a value" style use cases.
Views: Think of views as a virtual tables that is based on the result set of a query. Remember a VIEW functional behaves like a table but there is no data in it. It is all but a query. Views are great because you do not have to incur the cost of materializing the table. Views are not great when the result set is so large that your query times out i.e. you are better off materializing the results.
Remember, the ability of a query engine to query tables across databases depends on whether the query engine supports it.
Pro Tip: In Data Distiller, you can write federated queries against tables in databases across both Data Lake and the Accelerated Store. This means that I can use tables from thesee two stores in a single query.
Schema: Before you add these tables or views to the database, you may want to logically arrange them. Think of schemas as containers for tables sitting right underneath the database.
Remember, the ability of a query engine to query tables across schemas depends on whether the query engine supports it. As mentioned above, in Data Distiller, you can write federated queries against tables in databases across both Data Lake and the Accelerated Store. It does not matter what schema and database was used.
Data Model: The collection of tables under each schema under a database is called a data model. In essence, it is a collection of tables with an address of <Database Name.Table Name>
Think of another analogy of a data model is the following: Think of a neighborhood as the database. Each lane is a schema. Each of the houses on that lane are the tables. Remember, there could he houses that look exactly the same in different neighborhoods or even a different lane in the same neighborhood. I need a way to differentiate a house within the context of a neighborhood and a lane. When I say data model, I am asking for an address of the lane which is <Neighborhood, Lane Name>. To get to a house, I need <Neighborhood, Lane Name, House Name>.
But Why Do We Need Data Models
The simplest answer: For the exact same reasons as to why we need drives and folders on our computers. The idea is to make sure you are
I Do Not Care About Data Models. What Are My Options?
Your use case could be: I like clutter and my use cases do not need the organization that is being talked about here.
There are two key data models that are
Last updated