PREP 501: Ingesting JSON Test Data into Adobe Experience Platform

In this tutorial, we will learn how to ingest test data especially nested data into the Platform. You will need this in order to do your Data Distiller modules.

Prerequisites

You need to setup DBVisualizer:

PREP 400: DBVisualizer SQL Editor Setup for Data Distiller

You will need to download this JSON file. Extract the zip and copy the JSON file over:

23MB

Luma-web-data.json.zip

Scenario

We are going to ingest LUMA data into our test environment. This is a fictitious online store created by Adobe

The fastest way to understand what is happening on the website is to check the Products tab. There are 3 categories of products for different (and all) personas. You can browse them. You authenticate yourself and also can add items to a cart. The data that we are ingesting into the Platform is the test website traffic data that conforms to the Adobe Analytics schema.

Unlike the Movie Genre Targeting example where we simply dropped a CSV file and the data popped out as a dataset, we cannot do the same with JSON files as we need to specify the nested schema for the system to understand the schema of the data.

Setup Azure Storage Explorer

We will be using an interesting technique to ingest this data which will also form the basis of simulating batch ingestion. Download the Azure Storage Explorer from this link. Make sure you download the right version based on your OS and install it.
We will be using Azure Storage Explorer as a local file browser to upload files into AEP's Landing Zone: Azure-based blob storage that stays outside AEP's governance boundary. The Landing Zone is a TTL for data for 7 days and serves as a mechanism for teams to push data asynchronously into this staging zone prior to ingestion. It also is a fantastic tool for testing the ingestion of test data into AEP.
In the Azure Storage Explorer, open up the Connect Dialog by clicking the plug icon and then click on the ADLSGen2 container or directory option:

Choose the connection type as Shared SAS URL. What this means is that if there are multiple users who have access to the Landing Zone URL, they could all write over each other. If you are seeking isolation, it is only available at the sandbox level. There is one Landing Zone per sandbox.
Name the container and then add the credentials by going into Adobe Experience Platform->Sources->Data Landing Zone.

Now go into Adobe Experience Platform UI->Sources->Catalog->Cloud Storage->Data Landing Zone and View Credentials:

If you click on the View Credentials, you should get this screen. Click to copy the SAS Uri

Copy the SAS URI into the Storage Explorer Account setup:

Click next to complete the setup:

The screen will look like the following. Either drag and drop the JSON file or Upload:

Navigate to Adobe Experience Platform UI->Sources->Catalog->Cloud Storage->Data Landing Zone. You will either see Add Data or Setup button on the card itself. Click it to access the data landing Zone.

Voila! You should now see the JSON file you uploaded. You will also be able to preview the first 8 to 10 records (top of the file) as well. These records will be used for validating our pipeline for ingestion later.

Create a XDM Schema

Create a XDM Schema by going to Adobe Experience Platform UI->Schemas->Create XDM Experience Event

On the Schema screen, click on the pane for Field groups->Add

Search for "Adobe Analytics" as a term for Field Groups:

Add Adobe Analytics ExperienceEvent Template field group. This is a comprehensive field group but we will be using a portion of all the fields.

Save the schema as Luma Web Data.

Ingest Data from Data Landing Zone

Click on the XDM Compliant dropdown and change it to Yes:

Go to the next screen and fill out the details as exactly shown in the screen below. Name the dataset as luma_web_data, choose the Luma Web Dataset schema, and enable Partial Ingestion.

Configure the Scheduling to Minute and for every 15 minutes.

Click Next and Finish. Your dataflow should execute and you should see the dataset luma_web_data in Adobe Experience Platform UI->Datasets. Click on the dataset luma_web_data. You should see about 733K records ingested.

Note: By marking the dataset as XDM compatible in the dataflow step, we avoided having to go through a mapping process. I was able to choose XDM compatible because the Adobe Analytics schema I chose was a superset of the Luma schema. There is no point in me doing a manual mapping. If you are bringing in Adobe Analytics data in practice, you may not be this lucky as you will need to account for eVars and will need to do the mapping. That is beyond the scope of this guide.

Query the Data

The first query that you can type is:

select * from luma_web_data;

To get 50,000 results, you need to configure DBvIsualizer.

PREP 400: DBVisualizer SQL Editor Setup for Data Distiller

If you need to query the complex object, say, for the web object, use the to_json construct

select to_json(web) from luma_web_data;

Last updated 10 months ago