Help Centre

Managing pipelines
Reusing pipelines

Quantemplate pipelines can be reused to perform the same set of transformations on data sources which are regularly updated. Once you’ve configured a Quantemplate pipeline to transform a particular set of data sources, it’s easy to feed in identically formatted new data as you receive it.

For example, each month you receive a dataset from three different sources. You have configured a pipeline to transform the data to the desired output format. When new data comes in next month, it can be fed through the pipeline and exported to the data repo.

Running new data

To run new data through a pipeline:

  1. Go to the uploads tab and remove currently used raw data uploads.
  2. Open the stage you wish to bring in the new data to, e.g. Stage 1. Click ‘Add inputs’ and drag and drop the new files into the popup. Click ‘Apply’ to select the new files as inputs to the stage.
  3. If desired, disable the outputs from Stage 1 from being exportable.
  4. Open the next stage (e.g. Stage 2) and select the outputs from the previous stage as inputs. If this stage is a Union, the files will be renamed and flow through all connected downstream stages.
  5. Run the pipeline to generate your output data, validation and mapping reports

Best Practice

If you’re creating a pipeline that will be frequently re-used, consider using this structure:

  1. Start with a Transform stage to Remove Rows, Detect Headers and Map Columns to a common schema. You can also perform input validation at this point.
  2. Add a Unionstage to combine the source data. This creates a single output with the name based on the stage name, meaning the rest of the pipeline can flow in the new data automatically.
  3. Add subsequent stages to cleanse values, enrich, etc.

Dealing with source data format changes

If something changes in one of your source data formats, your pipeline may produce unexpected results. Quantemplate allows you to edit transformations to accommodate the new data format.

For example, one of your data providers has moved to a new system, generating slightly different column header names. To fix this, go into the Map Column Headers operation and configure the correct mappings.