Adobe Data Distiller Guide
  • Adobe Data Distiller Guide
  • What is Data Distiller?
  • UNIT 1: GETTING STARTED
    • PREP 100: Why was Data Distiller Built?
    • PREP 200: Data Distiller Use Case & Capability Matrix Guide
    • PREP 300: Adobe Experience Platform & Data Distiller Primers
    • PREP 301: Leveraging Data Loops for Real-Time Personalization
    • PREP 302: Key Topics Overview: Architecture, MDM, Personas
    • PREP 303: What is Data Distiller Business Intelligence?
    • PREP 304: The Human Element in Customer Experience Management
    • PREP 305: Driving Transformation in Customer Experience: Leadership Lessons Inspired by Lee Iacocca
    • PREP 400: DBVisualizer SQL Editor Setup for Data Distiller
  • PREP 500: Ingesting CSV Data into Adobe Experience Platform
  • PREP 501: Ingesting JSON Test Data into Adobe Experience Platform
  • PREP 600: Rules vs. AI with Data Distiller: When to Apply, When to Rely, Let ROI Decide
  • Prep 601: Breaking Down B2B Data Silos: Transform Marketing, Sales & Customer Success into a Revenue
  • Unit 2: DATA DISTILLER DATA EXPLORATION
    • EXPLORE 100: Data Lake Overview
    • EXPLORE 101: Exploring Ingested Batches in a Dataset with Data Distiller
    • EXPLORE 200: Exploring Behavioral Data with Data Distiller - A Case Study with Adobe Analytics Data
    • EXPLORE 201: Exploring Web Analytics Data with Data Distiller
    • EXPLORE 202: Exploring Product Analytics with Data Distiller
    • EXPLORE 300: Exploring Adobe Journey Optimizer System Datasets with Data Distiller
    • EXPLORE 400: Exploring Offer Decisioning Datasets with Data Distiller
    • EXPLORE 500: Incremental Data Extraction with Data Distiller Cursors
  • UNIT 3: DATA DISTILLER ETL (EXTRACT, TRANSFORM, LOAD)
    • ETL 200: Chaining of Data Distiller Jobs
    • ETL 300: Incremental Processing Using Checkpoint Tables in Data Distiller
    • [DRAFT]ETL 400: Attribute-Level Change Detection in Profile Snapshot Data
  • UNIT 4: DATA DISTILLER DATA ENRICHMENT
    • ENRICH 100: Real-Time Customer Profile Overview
    • ENRICH 101: Behavior-Based Personalization with Data Distiller: A Movie Genre Case Study
    • ENRICH 200: Decile-Based Audiences with Data Distiller
    • ENRICH 300: Recency, Frequency, Monetary (RFM) Modeling for Personalization with Data Distiller
    • ENRICH 400: Net Promoter Scores (NPS) for Enhanced Customer Satisfaction with Data Distiller
  • Unit 5: DATA DISTILLER IDENTITY RESOLUTION
    • IDR 100: Identity Graph Overview
    • IDR 200: Extracting Identity Graph from Profile Attribute Snapshot Data with Data Distiller
    • IDR 300: Understanding and Mitigating Profile Collapse in Identity Resolution with Data Distiller
    • IDR 301: Using Levenshtein Distance for Fuzzy Matching in Identity Resolution with Data Distiller
    • IDR 302: Algorithmic Approaches to B2B Contacts - Unifying and Standardizing Across Sales Orgs
  • Unit 6: DATA DISTILLER AUDIENCES
    • DDA 100: Audiences Overview
    • DDA 200: Build Data Distiller Audiences on Data Lake Using SQL
    • DDA 300: Audience Overlaps with Data Distiller
  • Unit 7: DATA DISTILLER BUSINESS INTELLIGENCE
    • BI 100: Data Distiller Business Intelligence: A Complete Feature Overview
    • BI 200: Create Your First Data Model in the Data Distiller Warehouse for Dashboarding
    • BI 300: Dashboard Authoring with Data Distiller Query Pro Mode
    • BI 400: Subscription Analytics for Growth-Focused Products using Data Distiller
    • BI 500: Optimizing Omnichannel Marketing Spend Using Marginal Return Analysis
  • Unit 8: DATA DISTILLER STATISTICS & MACHINE LEARNING
    • STATSML 100: Python & JupyterLab Setup for Data Distiller
    • STATSML 101: Learn Basic Python Online
    • STATSML 200: Unlock Dataset Metadata Insights via Adobe Experience Platform APIs and Python
    • STATSML 201: Securing Data Distiller Access with Robust IP Whitelisting
    • STATSML 300: AI & Machine Learning: Basic Concepts for Data Distiller Users
    • STATSML 301: A Concept Course on Language Models
    • STATSML 302: A Concept Course on Feature Engineering Techniques for Machine Learning
    • STATSML 400: Data Distiller Basic Statistics Functions
    • STATSML 500: Generative SQL with Microsoft GitHub Copilot, Visual Studio Code and Data Distiller
    • STATSML 600: Data Distiller Advanced Statistics & Machine Learning Models
    • STATSML 601: Building a Period-to-Period Customer Retention Model Using Logistics Regression
    • STATSML 602: Techniques for Bot Detection in Data Distiller
    • STATSML 603: Predicting Customer Conversion Scores Using Random Forest in Data Distiller
    • STATSML 604: Car Loan Propensity Prediction using Logistic Regression
    • STATSML 700: Sentiment-Aware Product Review Search with Retrieval Augmented Generation (RAG)
    • STATSML 800: Turbocharging Insights with Data Distiller: A Hypercube Approach to Big Data Analytics
  • UNIT 9: DATA DISTILLER ACTIVATION & DATA EXPORT
    • ACT 100: Dataset Activation with Data Distiller
    • ACT 200: Dataset Activation: Anonymization, Masking & Differential Privacy Techniques
    • ACT 300: Functions and Techniques for Handling Sensitive Data with Data Distiller
    • ACT 400: AES Data Encryption & Decryption with Data Distiller
  • UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS
    • FUNC 300: Privacy Functions in Data Distiller
    • FUNC 400: Statistics Functions in Data Distiller
    • FUNC 500: Lambda Functions in Data Distiller: Exploring Similarity Joins
    • FUNC 600: Advanced Statistics & Machine Learning Functions
  • About the Authors
Powered by GitBook
On this page
  • hour function
  • date_trunc function
  • year function
  • dayofweek function
  • datediff function
  • current_date function
  • current_timestamp function
  • current_timezone function
  • date function
  • date_add function
  • date_diff function
  • date_format function
  • date_from_unix_date function
  • hour function
  • last_day function
  • make_date function
  • month function
  • months_between function
  • next_day function
  • minute function
  • second function
  • timediff function
  • timestamp function
  • timestamp_micros function
  • timestamp_millis function
  • timestamp_seconds function
  • timestampadd function
  • timestampdiff function
  • date_part function
  • to_date function
  • to_timestamp function
  • to_unix_timestamp function
  • to_utc_timestamp function
  • year function
  • date_sub function
  • date_trunc function
  • dateadd function
  • datediff function
  • day function
  • dayofmonth function
  • dayofweek function
  • dayofyear function
  1. UNIT 9: DATA DISTILLER FUNCTIONS & EXTENSIONS

[DRAFT]FUNC 100: Date and Time Functions

Last updated 7 months ago

hour function

The hour function is used when you want to extract the hour component from a timestamp or datetime column. It's particularly useful for time-based analysis, such as:

  1. Aggregating Data by Hour: When you need to analyze events or actions (like clicks, sales, or logins) based on the hour of the day. For example, identifying peak activity hours in a campaign.

  2. Time-of-Day Patterns: When looking for trends or patterns in data based on the time of day. For instance, understanding what hours are most effective for sending marketing emails.

  3. Comparing Hourly Performance: When comparing the performance of different hours within a day across multiple campaigns, as shown in the query.

SELECT campaign_id, hour(click_timestamp) AS hour_of_day, COUNT(*) AS total_clicks
FROM campaign_clicks
GROUP BY campaign_id, hour_of_day;

date_trunc function

The date_trunc function is used when you want to aggregate data by a specific time interval, such as day, week, month, or year. In the provided query, date_trunc('month', transaction_date) is used to round the transaction_date down to the first day of the month, allowing you to analyze data at the monthly level. Here are some use cases for using the date_trunc function:

  1. Aggregating by Time Intervals: When you need to summarize data over consistent time periods, such as months, quarters, or years. This is useful for time series analysis, trend detection, or reporting.

  2. Monthly or Periodic Reporting: When generating monthly reports to summarize key metrics (e.g., total revenue, number of transactions) for each month.

  3. Smoothing Time-Series Data: When you want to eliminate daily fluctuations by summarizing data into larger time buckets, such as weeks or months, to better understand long-term trends.

  4. Comparing Performance Across Periods: When comparing metrics across different time intervals, like comparing revenue month-over-month.

The syntax for the date_trunc function is as follows:

date_trunc('unit', date)
  • unit: This specifies the level of truncation and can be values like 'year', 'quarter', 'month', 'week', 'day', 'hour', 'minute', or 'second'.

  • date: The date or timestamp expression that you want to truncate.

SELECT date_trunc('month', transaction_date) AS month, SUM(revenue) AS total_revenue
FROM transactions
GROUP BY date_trunc('month', transaction_date)
ORDER BY month;

year function

The year function in this query extracts the year from the signup_date field, allowing you to group and analyze data on an annual basis. Here are some situations where using the year function is beneficial:

  1. Yearly Aggregation: Useful for grouping data by year to summarize activities or events that occurred within each year. In the example below, it counts the number of customer signups per year.

  2. Cohort Analysis: Helps in tracking groups of customers who signed up in the same year, providing insights into customer behavior, growth trends, or retention over time.

  3. Year-over-Year Comparisons: Facilitates comparisons across different years, such as assessing revenue growth, user acquisition, or other key metrics.

  4. Trend Analysis: Useful for identifying patterns or trends over multiple years, such as determining which years had peak or low signup activity.

SELECT year(signup_date) AS signup_year, COUNT(customer_id) AS cohort_size
FROM customers
GROUP BY year(signup_date)
ORDER BY signup_year;

dayofweek function

The dayofweek function is useful for:

  1. Grouping Data by Day of the Week: It allows you to analyze trends or patterns based on the day, such as identifying which days have higher sales or more website traffic.

  2. Classifying Days as Weekend or Weekday: As shown in the example, you can use dayofweek to categorize days into "Weekend" or "weekday" for analysis.

  3. Scheduling and Planning: When analyzing tasks or events based on the day of the week, this function helps in scheduling resources more efficiently.

SELECT CASE
         WHEN dayofweek(transaction_date) IN (1, 7) THEN 'Weekend'
         ELSE 'Weekday'
       END AS day_type,
       SUM(revenue) AS total_revenue
FROM transactions
GROUP BY day_type;

datediff function

The datediff function is used to calculate the difference between two dates, typically returning the result as the number of days between them. In the context of the provided query, datediff is being used to determine the number of days between consecutive purchase dates for each customer.

SELECT customer_id, avg(datediff(purchase_date, lag(purchase_date) OVER (PARTITION BY customer_id ORDER BY purchase_date))) AS avg_days_between_purchases
FROM purchases;

Here's a breakdown of the query above and the use of datediff:

  1. Calculating Differences Between Consecutive Dates: The datediff function computes the difference in days between a purchase_date and the previous purchase_date for the same customer, as determined by the lag function.

  2. Using lag Function: The lag(purchase_date) function retrieves the previous purchase date for each customer_id, allowing you to compare it with the current purchase_date.

  3. Grouping by Customer: The PARTITION BY customer_id clause ensures that the calculations are performed separately for each customer, allowing you to analyze individual purchasing patterns.

  4. Averaging the Day Differences: The avg function calculates the average number of days between purchases for each customer, providing insight into their purchase frequency.

current_date function

Here’s a breakdown of the usage:

  1. Filtering Data for Today's Date: The query retrieves all customers who signed up on the current date by comparing the signup_date to current_date(). This helps identify new signups that occurred today.

  2. Use Cases for current_date():

    • Daily Reports: Generating reports that focus on today's activities, such as new signups, sales, or customer interactions.

    • Real-Time Monitoring: Tracking metrics that need to be updated continuously, like daily active users or same-day transactions.

    • Scheduled Queries: Running automated tasks or queries that process data based on the current date.

The current_date() function is used to get the current date (today's date) in SQL. In the given query, it is used to filter records where the signup_date matches today's date.

SELECT customer_id, signup_date
FROM customer_activity_data
WHERE signup_date = current_date();

current_timestamp function

Here’s a breakdown of its use:

  1. Capturing the Exact Interaction Time: By using current_timestamp(), you record the precise moment when the interaction took place. This is useful for time-sensitive data tracking, such as logging user actions or events.

  2. Use Cases for current_timestamp():

    • Event Logging: Recording the exact time of events, such as user interactions, system events, or changes in status.

    • Audit Trails: Keeping a detailed log of activities for compliance, debugging, or tracking user behavior over time.

    • Real-Time Analytics: Analyzing data based on the exact time of occurrence, which is helpful for real-time dashboards or time-series analysis.

The current_timestamp() function is used below to get the current date and time (timestamp) at the moment the query is executed. In the given INSERT statement, it adds a record to the campaign_interactions table with the exact time when the insertion occurs.

INSERT INTO campaign_interactions (customer_id, campaign_id, interaction_time)
VALUES (1234, 5678, current_timestamp());

current_timezone function

Here are the use cases:

  • Tracking Data Entry Timezone: This could be used to log the timezone in which the data entry occurred, particularly useful in multi-regional systems where data might be inserted from various geographical locations.

  • Localization of Campaign Analytics: When analyzing campaign interactions, knowing the timezone helps localize data for regional reports. It would enable the conversion of timestamps to the local time of the interaction, giving a more accurate representation of when customers interacted with campaigns.

  • Timezone-Based Personalization: If the system's timezone reflects the user's local time, you could use this data for personalized marketing. For example, sending notifications at specific times based on each user's local timezone.

  • Debugging and Audit Trails: In systems where data ingestion and interaction logs come from various regions, capturing the current timezone during data entry could help troubleshoot issues, understand latency, or provide insights into data processing across time zones.

  • Data Synchronization Across Regions: In distributed systems, knowing the current timezone for data entries could aid in synchronizing data across servers or applications located in different time zones.

SELECT customer_id, current_timezone() AS customer_timezone
FROM campaign_interactions;

date function

SELECT customer_id, date(click_timestamp) AS click_date
FROM customer_activity_data;

date_add function

SELECT customer_id, last_interaction_date, date_add(last_interaction_date, 7) AS predicted_next_interaction
FROM customer_activity_data;

date_diff function

SELECT customer_id, date_diff(current_date(), last_purchase_date) AS inactivity_days
FROM customer_activity_data;

date_format function

SELECT customer_id, date_format(transaction_date, 'MMMM yyyy') AS transaction_month
FROM customer_activity_data;

date_from_unix_date function

SELECT customer_id, date_from_unix_date(unix_timestamp) AS readable_date
FROM customer_activity_data;

hour function

SELECT customer_id, hour(click_timestamp) AS hour_of_day, COUNT(*) AS total_clicks
FROM customer_activity_data
GROUP BY customer_id, hour_of_day;

last_day function

SELECT customer_id, last_day(subscription_start_date) AS subscription_end_date
FROM customer_activity_data;

make_date function

SELECT make_date(2024, 12, 25) AS campaign_start_date;

month function

SELECT month(transaction_date) AS transaction_month, SUM(revenue) AS total_revenue
FROM customer_activity_data
GROUP BY transaction_month;

months_between function

SELECT customer_id, months_between(last_purchase_date, signup_date) AS months_between_purchases
FROM customer_activity_data;

next_day function

SELECT customer_id, next_day(last_interaction_date, 'Monday') AS follow_up_date
FROM customer_activity_data;

minute function

SELECT customer_id, minute(click_timestamp) AS minute_of_interaction, COUNT(*) AS total_clicks
FROM customer_activity_data
GROUP BY customer_id, minute_of_interaction;

second function

SELECT customer_id, second(click_timestamp) AS second_of_interaction
FROM customer_activity_data;

timediff function

SELECT customer_id, timediff(last_interaction_date, first_interaction_date) AS time_spent
FROM customer_activity_data;

timestamp function

SELECT timestamp('2024-12-31 23:59:59') AS campaign_end_timestamp;

timestamp_micros function

SELECT timestamp_micros(1696843573000000) AS event_timestamp;

timestamp_millis function

SELECT timestamp_millis(1696843573000) AS event_timestamp;

timestamp_seconds function

SELECT timestamp_seconds(1696843573) AS event_timestamp;

timestampadd function

SELECT customer_id, timestampadd(MINUTE, 30, click_timestamp) AS predicted_purchase_time
FROM customer_activity_data;

timestampdiff function

SELECT customer_id, timestampdiff(HOUR, first_interaction_date, last_interaction_date) AS hours_between_interactions
FROM customer_activity_data;

date_part function

SELECT customer_id, date_part('day', transaction_date) AS purchase_day
FROM customer_activity_data;

to_date function

SELECT to_date('2024-12-31', 'yyyy-MM-dd') AS campaign_launch_date;

to_timestamp function

SELECT to_timestamp('2024-12-31 23:59:59', 'yyyy-MM-dd HH:mm:ss') AS campaign_end_timestamp;

to_unix_timestamp function

SELECT to_unix_timestamp('2024-12-31 23:59:59', 'yyyy-MM-dd HH:mm:ss') AS unix_timestamp;

to_utc_timestamp function

SELECT to_utc_timestamp(click_timestamp, 'America/Los_Angeles') AS utc_click_time
FROM customer_activity_data;

year function

SELECT year(transaction_date) AS transaction_year, SUM(revenue) AS total_revenue
FROM customer_activity_data
GROUP BY transaction_year;

date_sub function

SELECT customer_id, date_sub(event_date, 7) AS reminder_date
FROM customer_activity_data;

date_trunc function

SELECT date_trunc('month', transaction_date) AS transaction_month, SUM(revenue) AS total_revenue
FROM customer_activity_data
GROUP BY transaction_month;

dateadd function

SELECT customer_id, dateadd(MONTH, 1, subscription_start_date) AS next_billing_date
FROM customer_activity_data;

datediff function

SELECT customer_id, datediff(current_date(), last_interaction_date) AS inactivity_days
FROM customer_activity_data;

day function

SELECT day(transaction_date) AS transaction_day, COUNT(*) AS total_transactions
FROM customer_activity_data
GROUP BY transaction_day;

dayofmonth function

SELECT dayofmonth(transaction_date) AS transaction_day_of_month, COUNT(*) AS total_transactions
FROM customer_activity_data
GROUP BY transaction_day_of_month;

dayofweek function

SELECT dayofweek(click_timestamp) AS engagement_day, COUNT(*) AS total_engagements
FROM customer_activity_data
GROUP BY engagement_day;

dayofyear function

SELECT dayofyear(transaction_date) AS transaction_day_of_year, COUNT(*) AS total_transactions
FROM customer_activity_data
GROUP BY transaction_day_of_year;
Page cover image
2MB
marketing_dataset_no_spaces.csv