Skip to main content

Hi Everyone, 

 

I have a compare table with two datasets (A and B) by account, and both datasets can have multiple revenue types with different amounts. 

 

I want to display only one revenue type per account with the following rules: 

 

  1.  Prefer the revenue type from Dataset A
  2.  If Dataset A has no value, use the revenue type from Dataset B
  3.  When a dataset has multiple revenue types, choose the one with the highest amount

I tried using the first() function, but it sometimes returns the revenue type with a smaller amount instead of the highest. 

 

How can I ensure the table always shows the single revenue type per account based on the highest amount? 

 

Thank you for any advice/help you can give!

1 answer
  1. Apr 9, 6:03 PM

    -- ── DATASET A ──────────────────────────────────────────

    q_a = load "DatasetA";

    q_b = load "DatasetB";

    -- Step 1A: Get max amount per key from Dataset A

    a_grouped = group q_a by DimensionKey;

    a_max = foreach a_grouped generate

    DimensionKey as DimensionKey,

    max(q_a.Amount) as MaxAmount_A;

    -- Step 2A: Join back to filter only rows where Amount = MaxAmount

    a_joined = cogroup q_a by DimensionKey, a_max by DimensionKey;

    a_at_max = foreach a_joined generate

    q_a.DimensionKey as DimensionKey,

    q_a.RevenueType as RevenueType,

    q_a.Amount as Amount,

    a_max.MaxAmount_A as MaxAmount_A;

    a_at_max = filter a_at_max by Amount == MaxAmount_A;

    -- Step 3A: Now first() is safe — all remaining rows have the max amount

    a_grp = group a_at_max by DimensionKey;

    a_final = foreach a_grp generate

    DimensionKey,

    first(a_at_max.RevenueType) as RevenueType_A;

    -- ── DATASET B ──────────────────────────────────────────

    -- Same pattern for Dataset B

    b_grouped = group q_b by DimensionKey;

    b_max = foreach b_grouped generate

    DimensionKey as DimensionKey,

    max(q_b.Amount) as MaxAmount_B;

    b_joined = cogroup q_b by DimensionKey, b_max by DimensionKey;

    b_at_max = foreach b_joined generate

    q_b.DimensionKey as DimensionKey,

    q_b.RevenueType as RevenueType,

    q_b.Amount as Amount,

    b_max.MaxAmount_B as MaxAmount_B;

    b_at_max = filter b_at_max by Amount == MaxAmount_B;

    b_grp = group b_at_max by DimensionKey;

    b_final = foreach b_grp generate

    DimensionKey,

    first(b_at_max.RevenueType) as RevenueType_B;

    -- ── COMBINE: Prefer A, fallback to B ───────────────────

    combined = cogroup a_final by DimensionKey full,

    b_final by DimensionKey full;

    result = foreach combined generate

    coalesce(a_final.DimensionKey, b_final.DimensionKey) as DimensionKey,

    coalesce(a_final.RevenueType_A, b_final.RevenueType_B) as FinalRevenueType;

0/9000