Best practice

Performance optimisation

The most performance-intensive operations are Automap Values and Aggregate .

Automap Values

Use only once per stage. If you need to use multiple Automap Values operations, put them in individual stages.
Reduce the number of potential matches by using context columns or segmenting the volume of data going in (e.g. GWP >$Xm)

Aggregate

Only use for Aggregation, don’t use to simply rename or subset columns.

One-to-many joins

A one-to-many join occurs when the items in one dataset are matched multiple times in the other dataset, creating a row for every permutation.

Consider if the full set of exploded values is required. Could the same result be achieved by joining to a simple lookup table? See Company Name Matching solution deep dive video , 23:50 onwards.

Partition

Creates a table for each value in the partition column, so use with care. Limited to 5000 tables.

Generating outputs