EXPLORE 200: Exploring Behavioral Data - Case Study with Adobe Analytics Data
Prerequisites
You need to make sure you complete this module that ingests Adobe Analytics web data into the Platform:
And of course, you should have:
Scenario
We are going to ingest LUMA data into our test environment. This is a fictitious online store created by Adobe
The fastest way to understand what is happening on the website is to check the Products tab. There are 3 categories of products for different (and all) personas. You can browse them. You authenticate yourself and also can add items to a cart. The data that we are ingesting into the Platform is the test website traffic data that conforms to the Adobe Analytics schema.
We need to run some analytical queries on this dataset.
Exploratory 1-Dimensional Queries
The goal of this exercise is to explore every column of the dataset individually so that we get a deep understanding of the columns. Once we understand each column, we can then build 2-dimensional and even n-dimensional queries.
Let us first retrieve all the results:
You can see that there are complex nested objects. Instead of going into the XDM schemas, we can query the data in place by using to_json.
Let us dig into the web JSON object (or XDM field group):
Explore web
Object
web
ObjectLet us dig one level deeper into webPageDetails.
We will use the dot notation to access any field in the hierarchy.
We can apply to_json
again:
pageViews
is an object. Let us access the elements of that array
You will get the following:
We can access the value by:
And you will get:
Explore marketing
Object
marketing
ObjectLet us work on the marketing
object:
The results show information about campaigns:
Explore channel
Object
channel
ObjectIf you execute the following code:
You will observe that there is duplication of data across these fields. marketing
object truly has a campaign name while the other fields are present in the channel
object.
Let us extract the channel type that is in the type field of the channel object as it has values such as search, email, and social.
The code for this will be:
The result will be:
Note the usage of the regular expression that is extracting the last word in the _type
field that looks like _type":"https:
//ns.adobe.com/xdm/channel-types/XXX"
regexp_extract(channel._type, '[^/]+$', 0)
: This is the main part of the query where you use theregexp_extract
function to perform regular expression extraction.channel._type
: This specifies the JSON field"_type"
inside thechannel
JSON object.'[^/]+$'
: This is a regular expression pattern. Let's break it down:[^/]
: This part matches any character except a forward slash ("/").+
: This indicates that the previous pattern ([^/]
) should occur one or more times consecutively.$
: This anchors the pattern to the end of the string.
0
: This argument specifies the group index to return. In this case,0
means that the entire match (the matched string) will be returned.
Explore ProductListItems
Array Object
ProductListItems
Array ObjectLet us access the ProductListItems
array:
Hint: A single page view for Add to Cart event will have multiple product items.
To access the first elements of this array, use the following:
Arrays offer themselves to even more interesting SQL queries. Arrays can be exploded i.e. each element of the array can be put into a separate row of a new table and other columns/fields will be duplicated:
Hint: You can also unnest
function instead of explode
.
Explore commerce
Object
commerce
ObjectLet us now explore the commerce
object:
commerce
object shows some commerce-related actions such as checkouts that thewebPageDetails
object does not have.
Let us reformat this object so that we can extract the commerce event types such as productViews
, productListAdds
and checkouts
as strings. I want to do this because I want to use GROUP BY on these event types later on. The fact that some of them are populated while some are not indicates that this is a nested structure and we will have no choice but to look at the commerce object itself in the XDM schema.
First, let us extract these fields as strings:
The results are:
Note: The syntax of commerce.checkouts.`value`==1.
Here value
has got two single opening quotation marks. This is to avoid conflict of value as a RESERVED keyword. The same will apply for commerce.`order`.* as
well.
But our string-based approach has a serious flaw. If you check the field group commerce,
you will see a lot of commerce event types. There is no guarantee that we will only see the 3 eveent types that we identified above:
To extract an arbitrary field name of the structs present in the commerce
object, we will use:
The result will be:
Note the following:
json_to_keys extracts
the top-level keys of the JSON objects present in commerce.to_json
converts the JSON object to a string.commerce_event_type[0]
extracts the first and only element of this array.Note that different structs in the commerce object have different values. Page view type structs will have a value equal to 1 while purchase type structs will have purchase totals. This extraction only works for extracting the commerce event types but does not extrapolate to the metadata of those events.
Alternatively, we could have simplified this query by avoiding the outer SELECT query by simply doing the following which will help us later:
Explore endUserIDs
Object
endUserIDs
ObjectLet us also check the endUserIDs
We can extract the email addresses
by using:
We can extract the mcids
by using
The results are:
Create a Semi-Flat View of the Adobe Analytics Data
Let us take the queries that we built and put them all together to create a SQL query that creates a somewhat flat structure of the data i.e. we will not expand ProductListItems.
Note the following:
We have assembled all the 1-dimensional
SELECT
queries into a view.The view is semi-flat because
ProductListItems
is not flattened i.e. put into separate rows or columns.We use a CREATE TEMP TABLE to store this view instead of materializing the view immediately because we want TEMP tables to be cached in Data Distiller for fast exploration.
Tip: If you want fast exploration of data in ad hoc query engine, just create a TEMP TABLE with the data that you want to explore. Remember that these temp tables are wiped after the user session ends as the cache is ephemeral.
Warning: If you are on and off, DBVisualizer will disconnect from Data Distiller. In that case, it will complain that the temp table does not exist because your session needs to be reestablished. In such situations where you cannot maintain the connectivity for long periods of time, you are better off just using CREATE TABLE
which will materialize the data onto the Data Lake.
If you decide to use CREATE TABLE:
Regardless of what you use, the results of the query look like:
With this view, you are now set to do any kind of analysis. The methodology shown above can be applied to any schemas we get from the Adobe Apps.
Appendix: Adobe App Schemas to Explore
The skills that you have learned in this module should set you up for success with any complex dataset that you will come across in the Adobe ecosystem.
Adobe Journey Optimizer
If you are interested in Adobe Journey Optimizer, you should explore this module:
Adobe Commerce
You should also explore Adobe Commerce with Adobe Experience Platform integration. Specifically, you need to be aware of this:
The Adobe Commerce field groups (JSON fields as a group) are very similar to that of Adobe Analytics schema.
There are some field groups that are unique to Adobe Commerce because of the nature of the storefront setup.
There can always be custom events that are unique to an industry or implementation.
Adobe Campaign
You can bring Campaign V8 delivery logs into the Adobe Experience Platform.
Marketo Engage
You can bring in all of these datasets from Marketo:
You can also bring in custom activity data as well.
Appendix: Array Operations
These are the array functions supported in Data Distiller:
size()
to determine the number of elements in a list (array)The bracket
[]
notation to access specific elements in arraystransform()
to apply a transformation to all elements in an arrayexplode()
to transform elements in a list into single rowsposexplode()
to transform elements in a list into single rows along with a column for the index the element had in the original listarray_contains()
to determine if an array contains a specific elementarray_distinct()
to remove duplicates from an arrayarray_except()
to subtract to arraysarray_intersect()
to determine the intersection (overlapping elements) of two arraysarray_union()
to determine the union of two arrays without duplicatesarray_join()
to concatenate the elements of an array using a delimiterarray_max()
to get the largest element from an arrayarray_min()
to get the smallest element from an arrayarray_position()
to a specific element from an array counting starting with 1array_remove()
to remove a specific element from an arrayarray_repeat()
to repeat the elements of an array a specific number of timesarray_sort()
to sort an arrayarrays_overlap()
to check if two arrays have at least one common element
Last updated