Skip to main content

Hi everyone,

 

I have a data set that lists items sold in different retailers/regions--the data includes rows for each item selling in the retailer, but not for items not sold in the retailer. To calculate the sales opportunity of introducing new items to a retailer, I want to create rows for items not currently sold for the purpose of making calculated fields.

The runaround solution I came up with was to isolate lists of all items and all retailers, joining them to get rows for every item/retailer combination, and joining this with the original data set to get rows for items not sold in the specific retailer. From here, I can create calculated fields and reference other fields not included in the sample below.

This solution seems to work with the sample I made with 2 items and 2 retailers -- is this the best/most optimal method when doing the same for hundreds of items and dozens of retailers?

I want to make sure the solutions I'm using are consistent and scalable, and as I am relatively new to Prep and data management, I would appreciate any tips or alternative solutions! Thanks!

 

Starting Data:

Best/cleanest way to create new rows based on partial data sets? 

Solution:

 

imageOutput:

image

1 respuesta
  1. 19 dic 2024, 05:19

    You are creating a Cartesian join, so 'scalable' is a relative term. You will always have as many records as number of items multiplied by number of retailers.

    Operations with integers are usually better than operations with strings, so instead of joining on Chain != Item, join on 1=1. Prep does allow you to create a join clause, so you need a calculation in each table with the value 1.

    Instead of filtering to the desired fields, you can use aggregate steps. For item, just group by item. For Chain, group by Chain and aggregate MIN Stores.

    The final fields may pick up the rename value (appended 1). If so, remove the duplicate fields and rename the final fields.

     

    James Emery

    Tableau Forums Ambassador

    Please click 'Select as Best' on the one reply that answers your question.

0/9000