EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
Learn to Navigate Data Efficiently with Incremental Extraction Using Data Distiller Cursors
Last updated
Learn to Navigate Data Efficiently with Incremental Extraction Using Data Distiller Cursors
Last updated
Load the sample dataset
using the following tutorial:
The dataset is very simple and has 5000 records.
id
: Unique identifier for each entry.
name
: Name of the individual or entity.
value
: Numeric value associated with each entry.
We’ll be using the Data Distiller Query Pro Mode Editor, a web-based editor that lets you view results of a query with limited options- 50, 100, 150, 300, or 500. While this may seem limiting, it’s important to remember that the editor operates within your browser, which has memory constraints for displaying results.
If you use a dedicated client installed locally on your machine, such as DBVisualizer, you can handle much larger datasets—up to 100,000 rows or more. However, even with local clients, you’ll eventually hit application memory limits.
This brings us to an interesting challenge: how can we efficiently paginate through the result set when the client editor imposes a limit on how much data can be displayed at once?
The answer is Data Distiller Cursors.
A cursor in Data Distiller is a database object used to retrieve, manipulate, and traverse through a set of rows returned by a query one row at a time. Cursors are particularly useful when you need to process each row individually, allowing for row-by-row processing that can be controlled programmatically.
Sequential Processing: Cursors allow you to sequentially access rows in a result set, which is helpful when you need to handle each row individually.
Row-by-Row Operations: Unlike standard SQL, which typically processes whole result sets, a cursor can fetch a limited number of rows at a time, letting you work with rows individually in a controlled manner.
Memory Efficiency: When working with very large datasets, fetching rows in smaller chunks (instead of loading all at once) can help manage memory usage and improve performance.
Batch Processing: When you need to process rows in smaller batches, especially with large datasets.
Row-by-Row Operations: For complex operations that require checking or modifying each row one at a time.
Data Migration or Transformation: Cursors can be helpful when copying data from one table to another while applying transformations.
Procedural Logic: Used in stored procedures or scripts where specific row-based logic or conditions need to be applied iteratively.
DECLARE: Defines the cursor data_cursor
with a specific SQL query.
FETCH: Retrieves a specified number of rows (e.g., 5) from the cursor.
CLOSE: Releases the cursor when no longer needed.
This process is especially valuable when working with large datasets, as it helps control memory usage by processing smaller chunks of data at a time.
Before you start, you need to open the Data Distiller Query Pro Mode Editor by navigating to
AEP UI->Queries->Create Query
Make sure you choose the Show Results option as 500 as we will use this limit to paginate through the results.
We will declare a cursor to select all rows from the sample_data
table. This cursor will allow us to retrieve a limited number of rows at a time.
Once you've declared the cursor, open it to prepare it for row fetching.
You can now fetch rows in batches of 500 rows. This is particularly useful if you're working with large datasets and want to process data in smaller chunks.
You should see the first 500 rows i.e. observe the ID
column
Now let us try and fetch the next 500 rows and observe the ID
column
After you’ve fetched all the rows you need, close the cursor to free up resources.
The entire code for your reference and templatizing to others:
The example in the tutorial below allows you to extract a chunk of rows at a time:
If you wanted to persist the results in that tutorial incrementally: