PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
I use DBVisualizer for my example creation and prototyping. This SQL Editot has a free version that will meet most of your needs except the ability to download JSON data. But if you can work with flat data (after selectively choosing and denormalizing that data) which is what you will do most of the time, this limitation is not an issue.
Note: I will be using the AEP and Adobe Experience Platform interchangeably in this tutorial.
Warning: Make sure you have the requisite permissions from your IT team on whether you can install DBVisualizer or not. If you are working in a regulated industry, you definitely need to find out what is allowed and what is not.
Why I Like DBVisualizer
The Data Distiller Query Editor is a basic SQL editor that is perfect for fast data exploration and also for query operationalization. However, it has limitations as far as query development is concerned that are addressed by DBVisualizer. With DBVisualizer, you can:
Query multiple databases from within a single UI.
Reuse the SQL code developed in one environment in another. If you were executing queries on a table in a warehouse and you migrated that table (its creation) over to AEP, you can just reuse the same SQL as long as it is Postgres compliant which is mostly the case.
You can preview up to 100 rows for the query. In DbVisualizer, you can set an upper limit on the queries. The maximum number of rows that a SELECT query can return in DBVisualizer is 50,000 rows as long as the query finishes within 10 minutes since it started execution.
If you have a set of deeply nested SQL subqueries, you can highlight the subquery and execute it.
You can execute a sequence of SQL commands separated by the semi-colon;
Easy access to SQL scripts and tables within the editor.
In the Free version, you can download the results locally as a CSV.
The editor is perfect for prototyping and development of complex SQL queries. The Data Distiller editor is evolving fast and expect these features to be available in the near future.
Why I Do Not Like DBVisualizer
There are some features that make DBVisualizer not ideal for Data Distiller. The Data Distiller Query Editor excels at these features:
You have to be comfortable with setting up database connections. But once setup, it gives you unparalleled power and control on your query development.
Each query execution forces a re-connect.
The tables you get to see in DBVisualizer is a snapshot of the tables at the start of the session. If you create new tables, then you need to refresh the connection with the metadata refresh i.e. disconnect and connect back again.
Some metadata commands will cause subsequent SELECT queries to not retrieve results. You have to disconnect and connect back.
Scheduling of queries is not possible. You can use the REST API to schedule these queries but it is better done in the Data Distiller UI.
Every scheduled job requires the creation of a Data Distiller Template that needs to be created inside the Data Distiller UI.
Monitoring and alerting setup for scheduled queries are best done within the Data Distiller UI.
Data Distiller Editor also has a Dashboards component where you can build Business Intelligence (BI) style dashboards with visualizations powered by star schemas in the Data Distiller Accelerated Store.
Tip: Prototyping, development, and validation is best done in a DBVisualizer-like tool. But operationalization of the SQL queries developed in DBVisualizer is best done in Data Distiller UI.
Download & Install DBVisualizer
Go to this link and download/install the appropriate version based on your OS
If you are on a Mac and do not know whether you have a Intel processor or otherwise, check the upper right corner
Setup Connection to Adobe Experience Platform
All of the AEP datasets that you will need to work with are called tables in the world of SQL. All of these tables reside under a database which acts like a namespace or a scope or a separator for this collection of tables. We need to log into this database that is on the web and so we will need a public IP address. As we send these SQL queries as requests over, the server needs to listen to them and so we need a port number. Last, but not least, we need a username and a password. As this communication happens between the client and the server, AEP needs to make sure that it is secure and expects that DBVisualzier will need to set the SSL mode to be set up. If you do not enable this, AEP will refuse your connection even if you got everything else right.
Click on Database->Create Database Connection
On the left-hand side pane, you can also see a list of Connections. Alternatively, you can right-click on Connections and create a database connection there as well.
You may be prompted with using the Wizard. Do not use it. We could have used it but it does not add value since we will be copy-pasting values from Data Distiller.
You will see a screen that looks like this. Name this connection as you will use it with a SQL script to send your queries into AEP.
The SQL dialect that Data Distiller speaks is PostgreSQL. There are two important pieces of information that we need to provide DBVisualizer so that it can interact and talk the same language.
Tip: Suppose you do not have access to AEP but still want to learn SQL concepts I showcase for free, you can just download a local version of PostgreSQL. For smaller-sized datasets as is the case with the examples I am using, it should mostly work just fine. Combine that with free JupyterLab for data science and you have all that you need:
PostgreSQL: https://www.postgresql.org/download/
JupyterLab: https://jupyter.org/install
Under the section of Database, set the Database Type to Postgres. You can also leave it as Autodetect.
Under the Database section, set the driver to PostgreSQL JDBC driver. This is a driver that is provided by Data Distiller. Each time, we execute a SQL query in DBVisualizer, these queries are packaged from DBVisualizer which is a Java application into API calls. We need a middle layer in between to translate these API calls to PostgreSQL database calls. Think of it as a translator that translates all that comes in from DBVisualizer. Since PostgreSQL is universally popular as a SQL language, this should give you a hint as to why finding a tool that can talk to Data Distiller is easy.
Note: You can guess that DBVisualizer is a Java-based app because it says outright that it needs a JDBC driver to connect to the database.
Tip: If you have to choose between dialects to learn, note that PostgreSQL is a very popular dialect of SQL and Python for data science.
Your setup screen so far should look like this:
Let us now go into Adobe Experience Platform and access the credentials page. You need to navigate to Data Management->Queries->Credentials
Tip: Make sure you note the IMSOrg and the Sandbox. The credentials are generated per IMSOrg and the sandbox. Rename your connection accordingly: YourString_{IMSOrgName}_{SandboxName}. For example, in the above picture, I could rename the connection as Saurabh_DeveloperEnablement_depadmin001
Tip: Click the overlapping squares to copy the entire string rather than highlighting and doing a copy of the string via a keyboard shortcut.8. Click and copy the information as shown by the arrows:
Click and copy the information from the Data Distiller Credentials UI to the Database screen.
Warning: The password expires every 24 hours. This is done for security reasons so that these passwords expire fast enough. Data Distiller supports non-expiring passwords for BI dashboard use cases where such an expiry would have hurt the user experience.
Now click on Properties Tab and navigate into Driver properties- this is for the JDBC driver configuration settings:
Set the property value and type require
Screen Shot 2023-08-19 at 1.11.20 PM
Go back to the Connections tab, click on Connect and you should see a successful connection that looks like this:
Tip: It is also possible to connect to a specific database in Data Distiller by using the following syntax for dbname=<sandbox_name>:<database_name>.<schema_name>.all. This is very helpful when you want to restrict access to the tables to those within a database.
Create a SQL Script in DBVisualizer
Click on the Scripts tab and right-click on it to Create File. When you do so, it will come across as unnamed.sql under Bookmarks
Name the SQL script and hit Enter. Double-click on the script name and it will show the following screen:
Choose the Connection from the dropdown. This feature is cool because it means that I can switch between development and production sandboxes within AEP.
Executing Test SQL Queries
You can start typing the following code:
The result you will see will look like the folllowing:
Tip: You can write multiple SQL queries on multiple lines as long as you separate out using the semi-colon.
Tip: You can highlight any SQL query or even a subquery and execute it. This is extremely useful to debug deeply nested queries.
Some of the columns contain very useful information:
datasetID is the unique ID used by the platform to keep track of the datasets.
dataset column contains interesting information that can help you differentiate between tables in the Data Lake vis-a-vis those in the Query Acceleration Layer:
qsaccel.XXX: The qsaccel namespace (it is a database) indicates all the tables that are contained in the warehousing engine called the Query Accelerated Store. This is a separate storage layer for storing tables that need to power BI Dashboards. This namespace restricts modifying any of the tables as these are system-generated tables for Real-Time CDP reporting dashboards. If you have Adobe Real-Time CDP, you will see these tables pop up. XXX is the table name.
cjm_qsaccel.XXX: The cjm_qsaccel namespace (it is a database) indicates all the tables that are contained in the warehousing engine called the Query Accelerated Store. This namespace restricts modifying any of the tables as these are system-generated tables for Adobe Journey Optimizer reporting dashboards. If you have Adobe Journey Optimizer, you will see these tables pop up. XXX is the table name
XXX.YYY.ZZZ: If you see a name that looks, say, like testexample.lookups.country_lookup, then this table was created as a custom table in the Query Accelerated Store via Data Distiller. textexample (XXX) is the custom database that you created, lookups (YYY) is the schema created underneath the database, and country_lookup (ZZZ) is the table name.
XXX i.e. Names without a dot notation: These are tables in the Data Lake.
Tip: You do not need to specify the dot notation when executing queries against these tables. Data Distiller treats all the tables uniformly across the storage layers and this means that all table names are unique regardless of where it is stored. Just knowing the namespace helps you track what tables are in the Data Lake and what is in the Query Acceleration Layer.
Copy any of the table names from this list and just execute a test query. Highlight the statement and press the play button.
The results will look like this:
Helpful Configurations and Features
You can export the results as a CSV by clicking the export icon. The steps to download the results are self-explanatory. Note that JSON export is not supported in the free version of DBVisualizer.
You can access the SQL execution history by clicking the Display the SQL History icon
You can set the number of rows that you want to get back in your results or exports by setting the Max Rows parameter in the UI shown below:
Tip: By default, DBVsiualizer returns 1000 results. Data Distiller like most query engines places a limit of 50,000 rows.
If you need more ideas on how to be more productive with DBVisualizer, refer to the following documentation:
Appendix
Debugging 0801 Errors When Logging Remotely from a Different Geographic Location
I have encountered errors in the past when I was trying to connect from Europe to the Data Distiller IMS Org based out of US.
This is because of this limitation in DBVisualizer:
The solution to this was to use the IP address and the port 5432.
Last updated