PIPELINES
- Introducing pipelines
- Pipelines workspace
- Managing pipelines
- Inputs and outputs
- External sharing
- Join and Union
- Dates
- Automap Values
- Validation
- Operations
- Syntax and functions
- Updates and legacy

DATA

ANALYSE

ADMIN

TUTORIALS
- Pipelines
- Analyse

Pipeline inputs

In this section

Data flow through Quantemplate ↓

Adding data
to a pipeline ↓

Data flow through Quantemplate

Quantemplate imports raw data via Feeds, cleanses and harmonises in Pipelines, then outputs it to the Data repo for downstream processes such as querying in Analyse or sending to external systems via API or distributing to partners via feeds.

Adding data to a pipeline

Bordereaux or submission data which requires cleansing should be uploaded to a feed, or directly to a pipeline via the Uploads tab or input selector.

To add data via the input selector:

Open a stage.
Click the ‘Add inputs’ button.
Select which category of data to add: Feeds, Uploads Reference Data or Partner Data. Data can be uploaded directly to the Uploads tab via drag-and-drop or the ‘choose files’ button.
Once files have been selected click ‘Apply selection’.

Supported file formats

Quantemplate supports XLS, XLSX, CSV and GZipped CSV files.

GZipping a CSV file can significantly reduce its file size and upload time. 7-Zip is useful tool for applying GZip compression. Mac users can also use a Terminal command.

We are aware of an issue where CSV files which contain a Byte Order Mark and have GZip compression applied may fail to upload. In this case, use an uncompressed CSV format or open the file in a text editor and export as UTF-8 with no BOM.

Input tabs

The main panel of the pipeline workspace contains four input tabs:

Uploads
Upload and manage raw data datasets.
Feeds
View submission data from feeds currently used in the pipeline
Reference data
View reference datasets currently used in the pipeline.
Partner datasets
View partner organisations’ pipeline outputs currently used in the pipeline.

Feeds tab

The feeds tab shows the feeds that have been used in the pipeline. The view shows the submissions sorted by feed name, then by submission requested date. Click on submission to reveal it in the feeds view.

If a resubmission has been requested, this is indicated by the resubmission icon.

Feed data is added to the pipeline via the input selector and can be set to auto-update each time a submission receives new data.

Uploads tab

In the Pipeline view, the uploads tab allows raw data files to be uploaded directly to a pipeline. Once uploaded they need to be added to the correct pipeline stages via the input selector.

The uploads tab shows:

Filename plus the names of any worksheets in an Excel file
Stage in which an uploaded dataset has been used as an input
File size
Uploaded date

The uploads tab can display up to 500 items. To display more items, apply filters or archive some files.

Sort the uploads list

By default, files are sorted by Uploaded date, with most recent uploads at the top. The list can also be sorted by Name and Size. Click on a column header to sort by that column. Click again to reverse the sort direction.

Excel files with multiple worksheets

In the uploads view, when an Excel file contains multiple worksheets, they are presented as a subset of the file. The data in each worksheet can be previewed by clicking on it. Clicking on the main file name will show a preview of the first worksheet.

Elsewhere in Quantemplate, worksheets in Excel files are represented by joining the filename to the worksheet name.

Example

Filename: Acme-claims-April-2018.xlsx
Worksheet name: Sheet1
Filename in Quantemplate: Acme-claims-April-2018.xlsx: Sheet1

Filter buttons for Used and Unused uploads allow the files uploaded to a pipeline to be easily managed.

Used shows only the datasets which have been used in the pipeline. If an Excel file contains multiple worksheets, only the used worksheet is shown, along with the Excel filename. This view gives you a compact summary of the uploads used in your pipeline.
Unused shows all uploaded files which have not been used in the pipeline. If an Excel file contains a worksheet which has been used, it will not appear. This view helps you quickly archive unused files, without accidentally removing a worksheet used in the pipeline.

Click a filter button once to toggle its state from showing to hiding, or vice-versa. Double-click a filter button to make it the only filter type showing.

Text filters allow you to search filenames to quickly find a file or worksheet.

Add or remove uploads

Adding uploads

To upload files to a pipeline, click the + button on the top right to open the browser file chooser, or drag-and-drop files directly onto the Uploads tab.

If a Used filter is applied at the time of upload, the upload progress will be shown, but the file will be filtered out once uploaded. This will also be the case if a text filter is applied which does not match the name of the uploaded file.

Adding files to the Uploads tab is useful if the uploads need to be previewed in Quantemplate before adding to the pipeline. Otherwise, the most efficient way to add uploads is via the stage input selector, where they can be quickly selected as inputs to the stage.

Archiving uploads

Archiving an uploaded file will remove it from the Uploads tab and any stages which use the file. The file will not be permanently deleted, since all uploads are preserved as part of the pipeline’s history. Restoring a pipeline to a previous revision number will also restore all uploads that were present at that revision point.

To remove an uploaded file, click on the cross which appears on the right when hovering an item. A confirmation button will appear. Click on this to archive the file. This cannot be undone.

Multi-select and batch archive

To archive multiple uploads at once, first select the files:

To select individual files, use the selection checkboxes which appear on the right of a row when hovering. Once one checkbox is selected, the checkbox will appear on the other rows.
To select all files, click on the checkbox on the top right of the table. If a filter is active, only the filtered items will be selected. If a filter is cleared, the selection remains in place.
To select all files within a range, select one item, then hold down shift whilst selecting another item. All items between them will also be selected.

When one or more items are selected, the bulk edit bar appears. Click the Archive button in the bulk edit bar to archive the selected files.

Archiving Excel files with multiple worksheets

If an Excel file has multiple worksheets, individual worksheets cannot be archived. This preserves the integrity of the uploaded data. The whole file can be archived by clicking the Archive button next to the filename.

Previewing uploads

Click on an item on the uploads tab to see a preview of the first 1,000 rows.

Once in the dataset preview, navigate between uploads via the file navigator dropdown, above the data grid. The currently previewed input is highlighted blue.

The list can be filtered to show only used or unused uploads, as described above.

To return to the Uploads view, click the ‘Uploads’ navigation button in the top left.

Filter a dataset preview

Apply filters to the data preview to understand your data better. Click the filter button on the top right, or press the F key, to open the filter bar.

Read more about filter bars in Quantemplate.

Download the original file

Data that is imported to Quantemplate is pre-processed, removing all visual formatting and annotations. Files with multiple Excel worksheets are split out into separate files. The original file, preserving formatting, annotations, tabs, etc. is retained.

To download the original file: navigate to the dataset preview for the upload, click on the download button on the top right and select ’Download Original file‘.

Reference data tab

About reference data

The Data Repo is the storage area for clean datasets and reference data.

Datasets with a single row of headers can be uploaded directly to the Data Repo to create a reference dataset for use in pipelines – for example, a target header schema to map to. The upload process will ignore any blank rows above or below the data, or blank columns either side of the data. The first line of data will be interpreted as column headers. Read more about uploading data to the data repo.

Cleansed outputs from pipelines can also be exported to the Data Repo for onward sharing via API, sharing within Quantemplate, or reporting on in Analyse.

About the reference data tab

The reference data tab shows reference datasets currently used in the pipeline. Datasets can be added to or removed from the pipeline via the stage input selector.

Alongside the dataset name, the tab shows the stage the dataset is used in, the row count, the date the data was updated, and whether the dataset is used in an Automap Values operation. The view can be sorted by name, rows and updated date.

Data updated vs. file updated

The reference data tab in the pipeline inputs displays the last date the data within a file was updated. This does not include changes to the file metadata such as archive status and file name.

Autorun uses the data updated date, rather than the file updated date, to trigger pipeline runs.

The Dataset Information popup shows information about how a dataset has been created and updated, and where else it is used. Read more about it here.

Click the Automap Values icon to show the names of the stages and operations the dataset is used in.

Click anywhere on the row to view the reference dataset in a new tab.

Using multiple browser tabs

If your organisation does not use SSO login, you may need to use a browser incognito tab to prevent automatic logout when using multiple tabs.

Dataset permissions and archive warnings

Other users need permission

If a pipeline is shared with other users and they do not have access to an input (a reference dataset or feed), a warning will be displayed in the inputs tab.

Click 'Manage dataset access' to share the input with user who do not have access to it.

If you're not the owner of the input, the owner will receive a sharing request.

Missing dataset: archived file

The dataset has been archived, so will not be accessible to the pipeline. The dataset should be replaced, or the dataset owner should restore the file from the archive. If you are the dataset owner, you can just click the ‘Restore file’ button which appears beneath the error message.

Notify an owner about a missing dataset

If a dataset has been archived, or you do not have permission to view a reference dataset used in the pipeline, you can send an email notification to the owner by clicking the ‘Notify owner’ button which appears beneath the error message.

Removing archived reference datasets from a pipeline

If a reference dataset is used in a pipeline but has been archived, or you do not have permission to access it, it will appear in the reference data tab, but not in the stage inputs. To remove an archived dataset, first restore it so that it appears in the stage inputs. Next, remove it from the stage inputs. It can then be archived again in the Data tab.

Partner data tab

Quantemplate allows organisations to make pipeline outputs available to partner organisations, where they can be used as inputs to a pipeline.

To add a partner organisation’s pipeline output to the current pipeline, use the Partners tab in the stage input selector.

The partner data tab shows all the partner organisation’s pipeline outputs which are being used in the current pipeline, alongside the stage they are used in and the last updated date. The view can be sorted by partner name.

Click on the row to preview a partner dataset.

Learn more about partner sharing →

Help Centre