The Date Cleanse operation finds values in the selected columns which do not look like dates or are not valid calendar days (e.g. 30/02/2017).
It removes these values from your data by replacing them with a blank, and optionally captures the invalid dates in a Validation Report.
Automatic conversion of dates in Excel files
Columns formatted as dates in Excel files (.xlsx, .xlsm, .xls) are automatically converted to Basic ISO format on uploading to Quantemplate.
The use of a Date Cleanse operation is still recommended to check for erroneous values within the date columns.
The Input date format should be set to Basic ISO
, regardless of the format they are displayed as in Excel.
All other date formats, including dates from CSV file imports, need to be converted to Basic ISO.
Valid date ranges
The Date Cleanse operation will count dates between the years 1000 and 2999 as valid.
If you need to check that dates fall within a more specific range, create a rule in the Validation operation.
Extract dates from text
If a value contains text as well as a date, the Date Cleanse operation will search the text for the first date encountered. The text will be removed and the date converted to Basic ISO format.
For example, the value
Today’s date is 29/02/2020
will be converted to
For Excel format files, dates are automatically converted to Basic ISO format. If the dates originally contained a time-of-day component (also known as a timestamp) this will be removed.
For CSV files, the original formatting will be retained, but the values will need converting to Basic ISO format for use in Quantemplate. In doing so Date Cleanse will remove any time-of-day components.
If your use case requires the time component to be retained and extracted to separate column, put in an Extract operation before Date Cleanse and use the following regex:
Other date handling capabilities in Quantemplate
Quantemplate's date capabilities also include date functions in Calculate for performing date arithmetic,
and a Date Output operation for formatting dates for downstream systems.
Learn more about working with dates in Quantemplate.
Date Cleanse workspace
Define groups of input columns which share common settings for expected format, output column and Validation Report. Learn more ↓
Input date format
Select the date format the input files are expected to be in. This can be a specific format
or, if multiple common formats are expected, an Auto option. Learn more ↓
Set the output data to overwrite the source column, or write to a new column. Learn more ↓
Optionally, create a Validation Report showing invalid dates which have been removed from your data and replaced with a blank. Learn more ↓
Click to edit the operation name and description.
Search column names. Matching results in the Input Columns and Output Columns are highlighted.
Define groups of input columns
To start to use Date Cleanse, first add groups of input columns which share common settings for the expected date format, output column, and Validation Report.
Input date format
Output column settings
New column: Reporting date
To quickly add a large number of columns, either:
Use the ‘Add all columns’ option then delete the columns which do not contain dates.
Type a word into the search field to reveal the ‘Add column names containing’ option. For instance, add all the column names containing the word ‘date’.
Note that if new date columns are subsequently added upstream of the Date Cleanse operation, the operation will need to be manually updated to include the new columns.
Set the input date format
Select the format the incoming dates are expected to be in. Values which do not match the selected format are removed and replaced with a blank value. The invalid values can optionally be captured in a Validation Report.
To select a date format, click in the ‘Input date format’ column and select from the range of Auto, or Precise options.
If an input column contains multiple date formats, or the date format is likely to change in future, use one of the Auto options
‘Auto’ tries to match each value in the column to one of the following day-month-year formats:
dd MM yyyy
Excel date serial
‘Auto US’ tries the formats above, but with month-day-year formats.
It is not possible to automatically detect whether a column contains solely US or rest-of world date formats.
If a group of input datasets contains some files with dates in US formats and some files in rest-of-world formats,
two Stages will be required: one for US formats, one for rest-of-world formats.
Because the Auto options try a range of formats, pipeline runtimes will be longer than using a precise format.
If an input column is in a single known format, use a Precise format to speed up your pipeline run time.
Precise formats are commonly used date formats. For each Precise format, any standard separators
Excel serial format
In the ‘1900 date system’, Excel stores dates as a serial number equivalent to days from 1 January 1900.
If the input date format is set to ‘Auto’, ‘Auto-US’ or ‘Excel serial’, Quantemplate will interpret 5-digit numbers (plus any number of decimal places) as dates.
Time-of-day information will be removed.
To define a specific date format that is not included in the list of preset formats,
click on the ‘Custom date format’ option. Select the desired date components from the suggestion list.
Custom formats in Date Cleanse:
Must include a year
Must include a month if they include a day
Cannot use repeated components of the same type, e.g.
Custom formats can be copied and pasted between input groups within Date Cleanse operations.
Click the ‘Copy’ button in the custom format field, navigate to the custom format field in another input group and use cmd/ctrl+v to paste.
A partial date contains only two components: a year, combined with either a month or a quarter.
Partial dates need to be converted to a specific day in order to work with date functions and the Date Output operation.
By default, they will be converted to the first day of the current period.
This can be changed by clicking on the settings cog that appears if a partial date format has been selected.
Partial date option
First day of current period
First day of next period
Last day of current period
Last day of previous period
Two-digit year threshold
Years which contain only two digits need to be converted into a four-digit year.
For example ‘53’ should be ‘1953’. To do this, an assumption needs to be made about whether the year is in the 1900s or the 2000s.
By default, years greater than 29 will be interpreted as 1900s, whilst years less than or equal to 29 will be interpreted as 2000s.
The threshold value can be changed by clicking on the settings cog that appears if a two-digit year format has been selected.
Years greater than ‘n’ are 1900s
Set the output column
By default, the cleansed dates will overwrite the data in the original column.
To write the data to a new column, click in the output column selector and select ‘New column’,
then enter a column name. New columns may not use the same name as existing columns.
When configuring the output column for a group of input columns,
a suffix can be defined. New columns will take the original column name plus the suffix.
The default suffix is ‘_Cleansed’.
Overwrite / New Column
New column: Reporting date
New column: Original_name_Cleansed
New columns are created at the right edge of the output dataset.
Validation Reports and dates
Date Cleanse removes values which do not match the expected input format and replaces them with a blank.
Switch on the Validation Report for a group of columns to generate a report and capture the invalid values.
Columns with invalid values will be listed with a warning,
columns with no invalid values will be listed as passed validation.
Drill down on warnings to see the rows containing the invalid values.
In the report, click on the ‘i’ icon to see the expected input date format.
Business rule validations
To check whether dates are in a certain range, or perform other business rule validations, use the Validation operation on a cleansed column of dates.