Help Centre

Best practice

Performance optimisation

The most performance-intensive operations are Automap Values and Aggregate .

Automap Values

  • Use only once per stage. If you need to use multiple Automap Values operations, put them in individual stages.
  • Reduce the number of potential matches by using context columns or segmenting the volume of data going in (e.g. GWP >$Xm)

Aggregate

Only use for Aggregation, don’t use to simply rename or subset columns.

One-to-many joins

A one-to-many join occurs when the items in one dataset are matched multiple times in the other dataset, creating a row for every permutation.

Consider if the full set of exploded values is required. Could the same result be achieved by joining to a simple lookup table? See Company Nmae Matching solution deep dive video , 23:50 onwards.

Partition

Creates a table for each value in the partition column, so use with care. Limited to 5000 tables.

Generating outputs

Disable outputs from all other stages so they do not appear in the outputs view. This will speed up the pipeline run, since these outputs do not have to be created.