Help Centre

Date Cleanse

In this section

About Date Cleanse

Add a Date Cleanse operation

To add a Date Cleanse operation to your pipeline, open the ‘Add Operation’ popup and select ‘Date Cleanse’.

Reformat dates for use in Quantemplate

The Date Cleanse operation converts dates to Quantemplate’s standard format (Basic ISO:
yyyymmdd
) so they can be used by the date functions in Calculate (
DATECALC
,
DURATION
,
ENDOFMONTH
, etc).

Remove invalid dates

The Date Cleanse operation finds values in the selected columns which do not look like dates or are not valid calendar days (e.g. 30/02/2017). It removes these values from your data by replacing them with a blank, and optionally captures the invalid dates in a Validation Report.

Automatic conversion of dates in Excel files

Columns formatted as dates in Excel files (.xlsx, .xlsm, .xls) are automatically converted to Basic ISO format on uploading to Quantemplate. The use of a Date Cleanse operation is still recommended to check for erroneous values within the date columns. The Input date format should be set to Basic ISO
yyyymmdd
, regardless of the format they are displayed as in Excel.

All other date formats, including dates from CSV file imports, need to be converted to Basic ISO.

Valid date ranges

The Date Cleanse operation will count dates between the years 1000 and 2999 as valid. If you need to check that dates fall within a more specific range, create a rule in the Validation operation.

Extract dates from text

If a value contains text as well as a date, the Date Cleanse operation will search the text for the first date encountered. The text will be removed and the date converted to Basic ISO format. For example, the value
Today’s date is 29/02/2020
will be converted to
20200229
.

Date-time formats

For Excel format files, dates are automatically converted to Basic ISO format. If the dates originally contained a time-of-day component (also known as a timestamp) this will be removed.
For CSV files, the original formatting will be retained, but the values will need converting to Basic ISO format for use in Quantemplate. In doing so Date Cleanse will remove any time-of-day components. If your use case requires the time component to be retained and extracted to separate column, put in an Extract operation before Date Cleanse and use the following regex:
(([0-1]\d|2[0-3]):[0-5]\d(:[0-5]\d)?)

Other date handling capabilities in Quantemplate

Quantemplate's date capabilities also include date functions in Calculate for performing date arithmetic, and a Date Output operation for formatting dates for downstream systems. Learn more about working with dates in Quantemplate.

Date Cleanse workspace

Input columns

Define groups of input columns which share common settings for expected format, output column and Validation Report.
Learn more ↓

Input date format

Select the date format the input files are expected to be in. This can be a specific format
(e.g.
dd-mm-yyyy
) or, if multiple common formats are expected, an Auto option.
Learn more ↓

Output columns

Set the output data to overwrite the source column, or write to a new column.
Learn more ↓

Validation Report

Optionally, create a Validation Report showing invalid dates which have been removed from your data and replaced with a blank.
Learn more ↓

Operation name

Click to edit the operation name and description.

Search

Search column names. Matching results in the Input Columns and Output Columns are highlighted.

Define groups of input columns

To start to use Date Cleanse, first add groups of input columns which share common settings for the expected date format, output column, and Validation Report.

Example

Input columns
Input date format
Output column settings
Validation Report
Cancellation date
Claim date
Closed date
Incurred date
Notification date
Auto-US
Overwrite original
On
Reporting period
Year-quarter:
yyyy-Qq
New column: Reporting date
On
Tip
To quickly add a large number of columns, either:
  • Use the ‘Add all columns’ option then delete the columns which do not contain dates.
  • Type a word into the search field to reveal the ‘Add column names containing’ option. For instance, add all the column names containing the word ‘date’.
Note that if new date columns are subsequently added upstream of the Date Cleanse operation, the operation will need to be manually updated to include the new columns.

Set the input date format

Select the format the incoming dates are expected to be in. Values which do not match the selected format are removed and replaced with a blank value. The invalid values can optionally be captured in a Validation Report.
To select a date format, click in the ‘Input date format’ column and select from the range of Auto, or Precise options.

Auto options

If an input column contains multiple date formats, or the date format is likely to change in future, use one of the Auto options
‘Auto’ tries to match each value in the column to one of the following day-month-year formats:
  • yyyyMMdd
  • d-M-yyyy
  • d-MM-yyyy
  • dd-M-yyyy
  • dd-MM-yyyy
  • dd/MM/yyyy
  • dd.MM.yyyy
  • dd MM yyyy
  • dd-MM-yyyy HH:mm:ss
  • dd-MM-yyyy HH:mm:ss.fff
  • yyyy-MM-dd
  • dd-MMM-yyyy
  • Excel date serial
  • ddMMyyyy
‘Auto US’ tries the formats above, but with month-day-year formats.
It is not possible to automatically detect whether a column contains solely US or rest-of world date formats. If a group of input datasets contains some files with dates in US formats and some files in rest-of-world formats, two Stages will be required: one for US formats, one for rest-of-world formats.
Because the Auto options try a range of formats, pipeline runtimes will be longer than using a precise format.

Precise formats

If an input column is in a single known format, use a Precise format to speed up your pipeline run time.
Precise formats are commonly used date formats. For each Precise format, any standard separators
/.- 
are accepted.
Name
Components
Example
Full
dd-mm-yyyy
29-06-2012
Full US
mm-dd-yyyy
29-06-2012
ISO
yyyy-mm-dd
2012-06-29
Basic ISO
yyyymmdd
20120629
Short
dd-mm-yy
29-06-12
Short US
mm-dd-yy
06-29-12
Year-quarter
yyyy-Qq
2012-Q2
Year-month
yyyy-mm
2012-06
Excel serial
5-digit number
40909

Excel serial format

In the ‘1900 date system’, Excel stores dates as a serial number equivalent to days from 1 January 1900. If the input date format is set to ‘Auto’, ‘Auto-US’ or ‘Excel serial’, Quantemplate will interpret 5-digit numbers (plus any number of decimal places) as dates. Time-of-day information will be removed.
Input value
Output value
43890
20200229
43890.666667
20200229
00001
19000101

Custom formats

To define a specific date format that is not included in the list of preset formats, click on the ‘Custom date format’ option. Select the desired date components from the suggestion list.
Custom formats in Date Cleanse:
  • Must include a year
  • Must include a month if they include a day
  • Cannot use repeated components of the same type, e.g.
    yyyy-yy
Custom formats can be copied and pasted between input groups within Date Cleanse operations. Click the ‘Copy’ button in the custom format field, navigate to the custom format field in another input group and use cmd/ctrl+v to paste.

Partial dates

A partial date contains only two components: a year, combined with either a month or a quarter. Partial dates need to be converted to a specific day in order to work with date functions and the Date Output operation. By default, they will be converted to the first day of the current period. This can be changed by clicking on the settings cog that appears if a partial date format has been selected.
Input value
Partial date option
Output value
2012-Q2
First day of current period
20120401
2012-Q2
First day of next period
20120701
2012-Q2
Last day of current period
20120630
2012-Q2
Last day of previous period
20120331

Two-digit year threshold

Years which contain only two digits need to be converted into a four-digit year. For example ‘53’ should be ‘1953’. To do this, an assumption needs to be made about whether the year is in the 1900s or the 2000s. By default, years greater than 29 will be interpreted as 1900s, whilst years less than or equal to 29 will be interpreted as 2000s. The threshold value can be changed by clicking on the settings cog that appears if a two-digit year format has been selected.
Input value
Years greater than ‘n’ are 1900s
Output value
31/12/29
29
20291231
01/01/30
29
19300101
31/12/29
30
19301231

Set the output column

By default, the cleansed dates will overwrite the data in the original column. To write the data to a new column, click in the output column selector and select ‘New column’, then enter a column name. New columns may not use the same name as existing columns.
When configuring the output column for a group of input columns, a suffix can be defined. New columns will take the original column name plus the suffix. The default suffix is ‘_Cleansed’.
Input columns
Overwrite / New Column
Output columns
Cancellation date
Overwrite
Cancellation date
Reporting period
New column: Reporting date
Reporting period
Reporting date
Claim date
Closed date
Incurred date
Notification date
New column: Original_name_Cleansed
Claim date
Closed date
Incurred date
Notification date
Claim date_Cleansed
Closed date_Cleansed
Incurred date_Cleansed
Notification date_Cleansed
New columns are created at the right edge of the output dataset.

Validation Reports and dates

Date Cleanse removes values which do not match the expected input format and replaces them with a blank. Switch on the Validation Report for a group of columns to generate a report and capture the invalid values. Columns with invalid values will be listed with a warning, columns with no invalid values will be listed as passed validation. Drill down on warnings to see the rows containing the invalid values.
In the report, click on the ‘i’ icon to see the expected input date format.

Business rule validations

To check whether dates are in a certain range, or perform other business rule validations, use the Validation operation on a cleansed column of dates.