PIPELINES
- Introducing pipelines
- Pipelines workspace
- Managing pipelines
- Inputs and outputs
- External sharing
- Join and Union
- Dates
- Automap Values
- Validation
- Operations
- Syntax and functions
- Updates and legacy

DATA

ANALYSE

ADMIN

TUTORIALS
- Pipelines
- Analyse

Introducing pipelines

Example workflow

Quantemplate Pipelines provide a completely flexible toolset to cleanse and harmonise your data.

A typical workflow in Quantemplate follows this structure:

1. Create a pipeline

In the Pipelines view, create a new pipeline.

2. Upload raw data

Open a stage and use the input selector to upload raw data files, or bring in reference data.

3. Define the data area

Use Remove rows to identify the data area, then define the column headers via a Detect headersoperation.

4. Map column headers to a standard

Use Map Column Headers to convert the headers of the source file to match a standard schema.

Tip

In the data repo, store your master schema as a file containing only headers, then bring this into Map Column Headers as a source. Define the master schema from this file and map the other source files to it. Since the file contains no rows of data, when the output files from the first stage are unioned, the unioned output will not be affected.

5. Union the files

Once your input files have a consistent set of headers, bring them together into a single file with a Union stage.

6. Add further stages and operations

Add in Transform stages to bring in operations to cleanse the data and apply calculations. Use Join stages to enrich with other data sources. Create output formats for specific downstream systems, such as models or databases.

7. Run the pipeline to generate outputs

Running the pipeline makes the results available within the pipeline outputs, but does not export them to your data repo.

Tip

Accelerate pipeline run times by disabling stage outputs which are not required for export. See Exporting stage outputs for details.

8. Review outputs and quality reports

Review the output data, validation report and mapping reports from the outputs tab, then download directly, or export to your data repo to share, send to Analyse for querying and visualisation, or export via API.

9. Operational use

Now the pipeline is constructed, when new data is received from the same source, it can be uploaded and run straight through. See reusing pipelines for more information. A pipeline owner can also lock operations and join stages to prevent unwanted edits to a pipeline configuration.

Help Centre