Help Centre

Workspace
Stages and operations

In this section

About stages and operations ↓

Stage types ↓

Working with stages ↓

Adding and configuring stages and operations ↓

Stage inputs ↓

Stage outputs ↓

Editing, renaming and re-ordering ↓

Disabling stages and operations ↓

Configuration errors ↓

About stages and operations

Quantemplate breaks the data transformation process down into stages and operations, allowing you to sequence and structure your data transformation process in a way that is easy to browse and navigate.

Stages are structural components with a defined number of inputs and outputs. There are three kinds of stage: Transform, Union and Join.

Operations are individual transformation steps, grouped together within a Transform stage.

For a full list of operations available in Transform stages, see the Operations Index.

Stages can be visually expanded to show inputs, outputs and configuration options, and can be collapsed to provide an overview of the whole process. When clicked, operations slide across to reveal their configuration panel.

Stage types

Join stage

A Join stage combines two different datasets which have one or more shared columns of data. For example, a premium dataset could be joined to a claims dataset using the policy number column as the join point. A join stage would typically have one or two outputs. Learn more about the Join stage.

Union stage

A Union stage is used to combine multiple datasets with identical sets of column headers to produce a single output. Because the headers are the same in each source file the rows can be stacked on top of each other to produce a single unified output dataset. Learn more about the Union stage.

Transform stage

Operations to transform data are accessed and sequenced in Transform stages. Transform stages allow multiple input datasets; the number of output datasets will usually equal the number of input datasets, unless the Aggregate or Partition operations have been used. These operations can potentially change the number of output datasets created.

Working with stages

Sequencing and connecting stages

Data runs through the pipeline in sequential order, from top to bottom in the pipeline view. For the data to run between stages, the outputs of one stage are selected as the inputs to a lower stage.

Output file naming

The names of output files from a Transform stage are generated by appending the stage name to the start of the name of the source file. Stages that the data has passed through are therefore recorded in the file name.

A Join stage will name the output files according to the input file names and the matching options selected. See the article on Join stages for more.

A Union stage will name its single output file with the name of the stage.

Example

Input name
Bordereaux-20220-03-18.csv
Transform stage 1
Map and Tag Data
Output name
Map and Tag Data: Bordereaux-20220-03-18.csv
Transform stage 2
Calculations
Output name
Calculations: Map and Tag Data: Bordereaux-20220-03-18.csv
Union stage 3
Combined inputs
Output name
Combined inputs
Tip
Placing a Union stage early in a pipeline ensures that if an upstream file name changes, the inputs to downstream stages are retained and do not need to be re-selected.

Previewing changes with ‘Trace’

Trace is a feature that enables changes in pipeline configuration to be rapidly previewed downstream. As you build and edit a pipeline, Trace simulates and validates any changes you make along the way, without the need to execute the pipeline. This provides a real time view of the data structure at any point in a pipeline and helps to flag errors prior to running the pipeline.

As you make changes, it may take a few moments for Trace to run through the pipeline. Trace progress is indicated by a spinner on each stage being processed.

Adding and configuring stages and operations

To configure a Transform stage:

  1. Click ‘Add stage’ and select ‘Transform stage’.
  2. Click ‘Add inputs’ to select inputs for the stage.
  3. Click ‘Add operation‘ and select or search for an operation.
  4. Click on the operation to view its configuration panel.
  5. Run the pipeline to view the results.
  6. Add more operations as desired.

To configure Join stages, add inputs and click on the join configuration panel. Learn more.

To configure Union stages, simply add inputs – no further configuration required. Learn more.

Tip
Group operations which share the same inputs into a single stage. If you need a large number of operations it may help to break down the sequences into separate Transform stages, grouping related processes together.

Stage inputs

Adding stage inputs

Use the input selector to define inputs to a stage.

To add stage inputs:

  1. If the stage is contracted, expand it to reveal the operations.
  2. Click the green ‘Add inputs’ button. In the popup, navigate and search the available input files.
  3. Select the desired input files by clicking their checkboxes, or select all the results of a search by clicking ‘Select all results’.
  4. Click the green ‘Apply selection’ to apply the changes. Click ‘Cancel’ or close the popup to discard the changes.
Tip
Selections are retained when performing an additional search. This means you can search for ‘Premiums’ and select all results, then search for ‘Claims’ and select all results to input all datasets with either ‘Premiums’ or ‘Claims’ in their filename.

Linked Stages

The Link Stages feature enables the outputs of one stage to be linked to the inputs of another stage. If the first stage’s input files change, the new files will be passed through to the linked stage.

Using Linked Stages

A common use for Linked Stages is to link the output of Stage 1 to the inputs of a Stage 2, which is often a Union. When new files are added to Stage 1, they will flow through automatically to Stage 2.

Tip
If using Linked Stages, it's still recommended to Union incoming files early in the pipeline. The output ID from a Union Stage is immutable, so stages referencing it are not affected by changes to the Union's input files. Using a Union early makes the pipeline faster to Trace and more resilient to configuration changes.

To enable Linked Stages:

  1. Open to the Input Selector popup and navigate to the Stage Outputs tab.
  2. Hover over the Stage you wish link from. The Linked Stages button appears on the right.
  3. Click the Linked Stages and click the toggle in the popup to enable Linked Stages.
  4. Finally, click ‘Apply Selection’ to apply the setting.

Removing stage inputs

To remove selected inputs to stage:

  1. Click the green ‘Add inputs’ button.
  2. Locate the inputs to be removed and deselect their checkboxes
  3. Click ‘Apply selection’ to apply the changes.

The inputs will no longer be connected to the stage but will remain available for other stages.

If the Linked Stages setting is enabled, first disable it to deselect the files – or remove the files from the upstream stage.

To permanently remove uploads from a pipeline, visit the inputs tab.

Input sources

A stage can accept inputs from four sources:

When an input is selected within a category, a green dot is shown next to the category icon.

Uploading data to a pipeline

Bordereaux or submission data which requires cleansing should be uploaded to a pipeline, rather than the Data Repo.

The best way to upload data to a pipeline is via the Input Selector.

  1. Open a stage.
  2. Click add inputs.
  3. Go to the Uploads tab and add files via drag-and-drop or the ‘choose files’ button.
  4. Select the uploaded files using the checkbox and click ‘Apply Selection’.

Uploaded files are made available to all stages in the pipeline.

Stage outputs

Stage outputs are the datasets resulting from the processes applied to the data in a stage. They can be connected to the inputs of subsequent stages for further processing, exported your data repo, or downloaded directly.

Making stage outputs exportable

By default all stage outputs appear in your pipeline outputs available for export. Making an output non-exportable removes it from the list of Output Datasets list, though it is still available for use in subsequent pipeline stages. Disabling outputs that are not required for export has two benefits:

1. Cleaner output list
Your output list only displays your final datasets, so it’s easier to configure their export destinations.

2. Accelerated pipeline run-times
Limiting the number of datasets available for export will speed up pipeline run time.

To disable a stage output, click on it. The disabled output indicator displays next to the output name. Next time the pipeline runs it will be removed from the pipeline outputs list. To re-enable the output, click it again.

Editing, renaming and re-ordering

Renaming stages

To rename a stage, double click on the stage name, or click the edit icon to the right of the stage name.

Editing and renaming operations

To edit an operation, click on the operation to go to its edit panel. The stage the operation sits within is listed top left – click it to go back to the stages view.

To rename an operation, click on the operation to go to its edit panel. Click on the name to edit it.

Reordering stages and operations

To reorder stages, click on the stage number and edit it to the desired sequential position. Note that if a stage that uses outputs from a previous stage is reordered, to maintain the sequential nature of the stages its inputs will be removed and will need to be reselected. Stages which rely on the outputs for this stage will also have their inputs removed.

To reorder operations, hover over the operation you wish to move then drag and drop it to its new position.

Disabling operations and stages

Transform and join stages can be temporarily disabled, allowing before and after comparison of the effects of that stage on your output data, without losing their configuration.

To disable an operation or join stage, click on the diamond-shaped Disable Operation button next to the operation name. The operation or join stage is greyed-out, but the parameters can still be edited.

To disable all operations in a Transform stage, click on the Disable All Operations button next to the Operations heading. Stages with all the operations disabled will display greyed-out in the pipeline editor.

To re-enable operations, click the operations button.

Configuration errors

If a stage has been configured incorrectly, for example it contains a script operation with unsupported values, you will be notified via an error message in the stage and running the pipeline will be disabled.