Hi Everyone,
I have a compare table with two datasets (A and B) by account, and both datasets can have multiple revenue types with different amounts.
I want to display only one revenue type per account with the following rules:
- Prefer the revenue type from Dataset A.
- If Dataset A has no value, use the revenue type from Dataset B.
- When a dataset has multiple revenue types, choose the one with the highest amount.
I tried using the first() function, but it sometimes returns the revenue type with a smaller amount instead of the highest.
How can I ensure the table always shows the single revenue type per account based on the highest amount?
Thank you for any advice/help you can give!
-- ── DATASET A ──────────────────────────────────────────
q_a = load "DatasetA";
q_b = load "DatasetB";
-- Step 1A: Get max amount per key from Dataset A
a_grouped = group q_a by DimensionKey;
a_max = foreach a_grouped generate
DimensionKey as DimensionKey,
max(q_a.Amount) as MaxAmount_A;
-- Step 2A: Join back to filter only rows where Amount = MaxAmount
a_joined = cogroup q_a by DimensionKey, a_max by DimensionKey;
a_at_max = foreach a_joined generate
q_a.DimensionKey as DimensionKey,
q_a.RevenueType as RevenueType,
q_a.Amount as Amount,
a_max.MaxAmount_A as MaxAmount_A;
a_at_max = filter a_at_max by Amount == MaxAmount_A;
-- Step 3A: Now first() is safe — all remaining rows have the max amount
a_grp = group a_at_max by DimensionKey;
a_final = foreach a_grp generate
DimensionKey,
first(a_at_max.RevenueType) as RevenueType_A;
-- ── DATASET B ──────────────────────────────────────────
-- Same pattern for Dataset B
b_grouped = group q_b by DimensionKey;
b_max = foreach b_grouped generate
DimensionKey as DimensionKey,
max(q_b.Amount) as MaxAmount_B;
b_joined = cogroup q_b by DimensionKey, b_max by DimensionKey;
b_at_max = foreach b_joined generate
q_b.DimensionKey as DimensionKey,
q_b.RevenueType as RevenueType,
q_b.Amount as Amount,
b_max.MaxAmount_B as MaxAmount_B;
b_at_max = filter b_at_max by Amount == MaxAmount_B;
b_grp = group b_at_max by DimensionKey;
b_final = foreach b_grp generate
DimensionKey,
first(b_at_max.RevenueType) as RevenueType_B;
-- ── COMBINE: Prefer A, fallback to B ───────────────────
combined = cogroup a_final by DimensionKey full,
b_final by DimensionKey full;
result = foreach combined generate
coalesce(a_final.DimensionKey, b_final.DimensionKey) as DimensionKey,
coalesce(a_final.RevenueType_A, b_final.RevenueType_B) as FinalRevenueType;