Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  • Prerequisites
  • Case Study Overview
  • What is a Data Distiller Cursor?
  • Common Uses of Cursors
  • How a Data Distiller Cursor Works
  • Declare a Data Distiller Cursor
  • Fetch Rows Using the Cursor
  • Close the Cursor
  • Appendix: Cursors in Python
  1. Unit 2: DATA DISTILLER DATA EXPLORATION

EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors

Learn to Navigate Data Efficiently with Incremental Extraction Using Data Distiller Cursors

Last updated 7 months ago

Prerequisites

Load the sample dataset

using the following tutorial:

Case Study Overview

The dataset is very simple and has 5000 records.

  • id: Unique identifier for each entry.

  • name: Name of the individual or entity.

  • value: Numeric value associated with each entry.

We’ll be using the Data Distiller Query Pro Mode Editor, a web-based editor that lets you view results of a query with limited options- 50, 100, 150, 300, or 500. While this may seem limiting, it’s important to remember that the editor operates within your browser, which has memory constraints for displaying results.

If you use a dedicated client installed locally on your machine, such as DBVisualizer, you can handle much larger datasets—up to 100,000 rows or more. However, even with local clients, you’ll eventually hit application memory limits.

This brings us to an interesting challenge: how can we efficiently paginate through the result set when the client editor imposes a limit on how much data can be displayed at once?

The answer is Data Distiller Cursors.

What is a Data Distiller Cursor?

A cursor in Data Distiller is a database object used to retrieve, manipulate, and traverse through a set of rows returned by a query one row at a time. Cursors are particularly useful when you need to process each row individually, allowing for row-by-row processing that can be controlled programmatically.

  1. Sequential Processing: Cursors allow you to sequentially access rows in a result set, which is helpful when you need to handle each row individually.

  2. Row-by-Row Operations: Unlike standard SQL, which typically processes whole result sets, a cursor can fetch a limited number of rows at a time, letting you work with rows individually in a controlled manner.

  3. Memory Efficiency: When working with very large datasets, fetching rows in smaller chunks (instead of loading all at once) can help manage memory usage and improve performance.

Common Uses of Cursors

  • Batch Processing: When you need to process rows in smaller batches, especially with large datasets.

  • Row-by-Row Operations: For complex operations that require checking or modifying each row one at a time.

  • Data Migration or Transformation: Cursors can be helpful when copying data from one table to another while applying transformations.

  • Procedural Logic: Used in stored procedures or scripts where specific row-based logic or conditions need to be applied iteratively.

How a Data Distiller Cursor Works

  • DECLARE: Defines the cursor data_cursor with a specific SQL query.

  • FETCH: Retrieves a specified number of rows (e.g., 5) from the cursor.

  • CLOSE: Releases the cursor when no longer needed.

This process is especially valuable when working with large datasets, as it helps control memory usage by processing smaller chunks of data at a time.

Declare a Data Distiller Cursor

Before you start, you need to open the Data Distiller Query Pro Mode Editor by navigating to

AEP UI->Queries->Create Query

Make sure you choose the Show Results option as 500 as we will use this limit to paginate through the results.

We will declare a cursor to select all rows from the sample_data table. This cursor will allow us to retrieve a limited number of rows at a time.

-- Declare the cursor
DECLARE data_cursor CURSOR FOR
SELECT id, name, value FROM sample_dataset;

Once you've declared the cursor, open it to prepare it for row fetching.

Fetch Rows Using the Cursor

You can now fetch rows in batches of 500 rows. This is particularly useful if you're working with large datasets and want to process data in smaller chunks.

-- Fetch the first 500 rows
FETCH 500 FROM data_cursor;

You should see the first 500 rows i.e. observe the ID column

Now let us try and fetch the next 500 rows and observe the ID column

-- To fetch the next 500 rows, repeat this command
FETCH 500 FROM data_cursor;

Close the Cursor

After you’ve fetched all the rows you need, close the cursor to free up resources.

-- Close the cursor when done
CLOSE data_cursor;

The entire code for your reference and templatizing to others:

-- Declare the cursor
DECLARE data_cursor CURSOR FOR
SELECT id, name, value FROM sample_dataset;

-- Fetch the first 500 rows
FETCH 500 FROM data_cursor;

-- To fetch the next 500 rows, repeat this command
FETCH 500 FROM data_cursor;

-- Close the cursor when done
CLOSE data_cursor;

Appendix: Cursors in Python

The example in the tutorial below allows you to extract a chunk of rows at a time:

If you wanted to persist the results in that tutorial incrementally:

import psycopg2
import csv

# Establish a connection to the database
conn = psycopg2.connect('''sslmode=require host=ZZZZ port=80 dbname=prod:all user=YYYYY@AdobeOrg password=XXXXX''')

# Create a cursor object for executing SQL commands
cursor = conn.cursor()

# Example query
query = "SELECT * FROM movie_data;"
cursor.execute(query)

# File to save the data
output_file = "movie_data.csv"

# Open the file in write mode
with open(output_file, mode='w', newline='', encoding='utf-8') as file:
    csv_writer = None  # Initialize the CSV writer variable
    chunk_size = 50

    while True:
        # Fetch the results in chunks
        chunk = cursor.fetchmany(chunk_size)

        # Break the while loop if there are no rows to be fetched
        if not chunk:
            break

        # Write the header row only once, when processing the first chunk
        if csv_writer is None:
            column_names = [desc[0] for desc in cursor.description]
            csv_writer = csv.writer(file)
            csv_writer.writerow(column_names)  # Write the header row

        # Write each row of the chunk to the CSV file
        csv_writer.writerows(chunk)

# Close the cursor and connection
cursor.close()
conn.close()

print(f"Data has been successfully written to {output_file}.")
STATSML 100: Python & JupyterLab Setup for Data Distiller
PREP 500: Ingesting CSV Data into Adobe Experience Platform
693B
sample_data.csv
Show results has limited options for the number of rows that you see in the final result.
Define the cursor.
Show the first 500 rows
Extract the next 500 rows.
Close the cursor.
Page cover image