Get Started with Batch Data Transforms in Data Cloud

Learning Objectives

After completing this unit, you’ll be able to:

Describe batch data transforms and when to use them.
Identify the different kinds of data transformation nodes.
Describe how to create a batch data transform.

Introducing Batch Data Transforms

In Data Cloud, data comes in through a data stream and resides in a data lake object (DLO). A DLO is the storage container for data ingested into Data Cloud. A data transform lets you access data in one or more DLOs and transform it to create your own set of data.

When to Use a Batch Data Transform

In contrast to a streaming data transform, which runs continually, a batch data transform runs on a scheduled basis. Batch data transforms offer more functionality than streaming data transforms, which are based on a SQL statement. Batch data transforms offer a rich visual editor. You use this editor to combine data from multiple DLOs, use functions to create calculated fields, and you can output data to multiple DLOs.

The Batch Data Transform canvas with nodes that act on customer, ticket, and merchandise data.

Use a batch data transform when you need to do complex data transformations or when you need the data updated on a scheduled basis. In a batch data transform, you can join, aggregate, and append data. You can also use formulas and filters.

How Does a Batch Data Transform Work?

Using the visual editor, you drag and drop nodes to create the data you need. A node represents each step in the process. Nodes represent the source and target data and the various operations you perform on that data.

When you create a batch data transform, you can use the different node types to extract the exact data you need. Here are the different node types you can choose and what they do.

Node type	What it does
Aggregate	Rolls up data to a higher granularity using these functions: Average, Count, Maximum, Minimum, Stddevp, Stddev, Sum, Unique, Varp, and Var.
Append	Combines rows from multiple sets of data.
Filter	Removes rows that you don’t need in your target data.
Input	Contains source data in a DLO.
Join	Joins two input nodes via a lookup or join. Each input node must have a key field. For example the customer data input node and the ticket sales node each have a customer ID field.
Output	Contains the transformed data in a DLO.
Transform	Manipulates data by use of functions. With this node, you can calculate values, modify string values, format dates, edit data attributes, drop columns, and so on.
Update	Swaps column values with data from another data source when key pairs match.

Create a Batch Data Transform

Now that you know what a batch data transform is, let’s see how it works in the real world. Let’s say you work for a sporting events company that sells tickets to games. The company also sells merchandise for each game. You want to create a list of VIP customers based on customer ticket and merchandise purchases.

Before you start building a transform, you first create a DLO that will contain the transformed data. The target DLO in this transform is called VIP Customers and has a category of Profile because the data is a list of customers. Although the DLO is named VIP Customers, we gave it the more descriptive name of Update VIP Customers DLO in the transform.

The Batch Data Transform canvas with numbered nodes that manipulate customer, ticket, and merchandise data and correspond to the steps to create the transform.

Now you’re ready to build the data transformation. When you select Batch Data Transform, it opens a blank canvas. Start by adding your first data source: Customers DLO (Profile).
Now that you have your customer data, add two join nodes: one to the Merchandise Purchase DLO and one to the Ticket Purchase DLO. These DLOs both contain engagement data and are related by Customer ID. You end up with a denormalized set of data that has customers and their related ticket purchase and merchandise purchase data.
Add a transform node to identify VIP customers. This node performs a number of operations: It calculates the customer lifetime value by adding the ticket sales amount and the merchandise amount; it drops unneeded columns; calculates average customer lifetime value; and identifies whether the customer is a VIP.
Add a filter node to extract the VIP customers.
Add a transform node to drop columns you don’t need in the final dataset.
Add an output node to hold the transformed data. The output node is the target DLO you created at the beginning of this process.
Save and run the transform.

Preview Results

After the transform successfully completes, go to Data Explorer to open the VIP Customers DLO and inspect the data.

The Data Explorer page with the VIP Customers DLO selected and the data in that DLO.

Time Estimate

Topics

Looking for Help?

Data Cloud Resources