Quantemplate allows validation rules and actions to be
configured via a Validate operation within a pipeline Transform stage. On
running the pipeline the results are summarised in a Validation Report.
Validation rules are logical queries configured using
Quantemplate’s Equation Editor. This allows flexible queries to be devised, such
as checking that the values in a column do not exceed the values in another column,
or that the length of a text string is within specified parameters.
If the result of the rule is ‘true’ (e.g. a text
string is within a specified length) the validation will pass. If the result
is ‘false’ the rule can be set to return a failure or a warning in the validation
report.
To add a Validate operation:
To return to the pipeline view, click the stage name on the top left
To configure a validation rule:
Rules can be duplicated, deleted or re-ordered.
If the result of a validation rule is ‘true’ the validation will pass. If the result is ‘false’ the rule can trigger a failure, a warning, or take no action.
Intended for validation results with erroneous values which adversely affect data quality.
Intended for validation results which may be cause for concern possibly requiring further investigation.
To disable a validation rule without deleting it, set ‘No action’. This can be useful for incrementally debugging a dataset.
To reorder validation rules, hover over a drag handle at the right end of the validation rules row. Drag and drop the rule to its new position in the table. The new position will be reflected in the validation report.
Validation rules can be copied and pasted all at once to another validation operation. This is useful for duplicating a set of validation rules across multiple pipelines.
To copy rules: open a Validate operation and with no rules selected click ctrl+c (Win) or cmd+c (Mac).
To paste rules: open a Validate operation and click ctrl+v (Win) or cmd+v (Mac). This will add the copied rules below any existing rules.
To duplicate a Validate operation: copy the rules from one operation, create a new Validate operation and paste them in.
Note that the Validate operation will need the same input column names for the rules to work.
It is sometimes useful to rename a Validate operation. You may wish to do this to provide a more precise description of the nature of the validations in the operation, or to distinguish between multiple validation operations in a single stage.
To rename the validation operation:
The Equation Editor in the Calculate and Validate operations supports a range of Excel-like functions for general purpose data cleansing and set of functions specifically for cleansing dates. To define columns, values and operators in functions, use the syntax described in the Operators and syntax section.
Returns the absolute value of a number.
ABS(<column>)
VALUE |
ABS('VALUE') |
1 |
1 |
-1 |
1 |
Joins multiple values together to create a single text value.
Note that if named columns are not present, the run will fail. If this is not desired behviour, consider using the Combine Columns operation instead.
CONCATENATE(<column name or value> … )
ID |
CONCATENATE("X", 'ID') |
123 |
X123 |
1234 |
X1234 |
Finds a text value and returns position of its first character.
FIND(<search within>, <search for>)
Note:
FIND(UPPER(<search within>), UPPER(<search for>))
Find the position of the character ‘d’ within the ID column.
ID |
FIND('ID’, “d”) |
abcd |
4 |
defg |
1 |
efgh |
0 |
Extract the first word of a text value, for example extracting a first name from a full name. Note that in this example, if the name does not contain a space, a blank will be returned.
NAME |
LEFT('NAME', FIND('NAME', " ") - 1) |
Ronald Davies |
Ronald |
Jane Alice Smith |
Jane |
F. Scott Fitzgerald |
F. |
Madonna |
Check whether a column contains a given text value, for example check whether a post code contains SE4.
Postcode |
IF(FIND('Postcode', "SE4") > 0, "Yes", "No") |
SE4 5BX |
Yes |
SE13 4WZ |
No |
SE47RS |
Yes |
Remove the first word, for example remove a building number from a street address. This expression looks for the position of the first space and returns everything after that space, returning ‘Unknown’ if no space is found.
Street |
IF(FIND('Street', " ") = 0, "Unknown", MID('Street', FIND('Street', " ") + 1, LENGTH('Street')))
|
23 Acacia Avenue |
Acacia Avenue |
361–364 Park Road |
Park Road |
London |
Unknown |
Checks whether a condition is met and returns one value if true, and another value if false.
IF(<logical query>, <value if true>, <value if false>)
Examples of logical queries:
'Line_of_Business' = "Property"
'Region' = "Americas"
'Claim' > 1000000
If the Line of Business is Property, then multiply premium by 1.2. If it is not Property leave the Premium value unchanged.
IF('Line_of_Business' = "Property", 'Premium' * 1.2, 'Premium')
And statement
IF('Column 1' > 'Column 2' and 'Column 3' < 'Column 4',
value_if_true, value_if_false)
Or statement
IF('Column 1' > 'Column 2' or 'Column 3' < 'Column 4',
value_if_true, value_if_false)
Combined statement
IF(('Column 1' > 'Column 2' or 'Column 1' < 'Column 5')
and ('Column 3' < 'Column 4'), value_if_true, value_if_false)
Returns a value if a calculation generates an error.
IFERROR(<calculation>, <value>)
If the calculation ‘claim_amount / claim_count’ generates an error, return the value 0.
IFERROR('claim_amount' / 'claim_count', 0)
Returns TRUE if value is a number, and FALSE if not (for use with IF statements).
ISNUMBER(<column>)
value |
IF(ISNUMBER('value'), "Number", "Not a number") |
1 |
Number |
"1" |
Number |
1.2 |
Number |
"" |
Not a number |
"text" |
Not a number |
Returns TRUE if value matches a regular expression, and FALSE if not (for use with IF statements)
ISLIKE(<column>, "regular expression")
ID |
IF(ISLIKE('ID', "4$"), "Ends with 4", "Doesn't end with 4") |
123 |
Doesn’t end with 4 |
1234 |
Ends with 4 |
Returns the first characters from a value, given the number of characters
LEFT(<column>, <no. of characters>)
Address |
LEFT('SIC Code', 4) |
0800 Forestry |
0800 |
2080 Beverages |
2080 |
Returns the length of each column value.
LENGTH(<column>)
ID |
LENGTH('ID') |
123 |
3 |
1234 |
4 |
Converts text to lowercase.
LOWER(<column name or text value> … )
Returns the maximum value across multiple columns or constant values.
Each parameter can be a numeric value or a column name. If the parameter is a column name then the function will get the value in that column. All the values are compared and the largest value is returned.
MAX(<column name or numeric value>, … )
VALUE |
MAX('VALUE',0) |
1 |
1 |
-1 |
0 |
Returns characters from a value, given starting position (where first character is 1) and number of characters.
MID(<column>, <start position>, <no. of characters>)
ID |
MID('ID', 4, 2) |
12345678 |
45 |
abcdefghijkl |
de |
Returns the minimum value across multiple columns or constant values.
Each parameter can be a numeric value or a column name. If the parameter is a column name then the function will get the value in that column. All the values are compared and the smallest value is returned.
MIN(<column name or numeric value>, … )
VALUE |
MIN('VALUE',0) |
1 |
0 |
-1 |
-1 |
Converts the first letter of each word to uppercase and the other letters to lowercase.
PROPER(<column name or text value> … )
VALUE |
PROPER('VALUE') |
SAN FRANCISCO |
San Francisco |
sao paulo |
Sao Paulo |
SAN DIEGO, CA |
San Diego, Ca |
Returns the last characters from a value, given the number of characters
RIGHT(<column>, <no. of characters>)
Address |
RIGHT('Address', 8) |
London, EC1Y 1AA |
EC1Y 1AA |
Penzance, TR19 7AA |
TR19 7AA |
Rounds a number to a given number of digits.
ROUND(<column name or numeric value>, <no. of digits>)
Numbers 1–4 are rounded down and 5–9 are rounded up. Numbers are rounded to a specified level of precision. They can be rounded to the right or left of the decimal point.
Formula |
Description |
Result |
ROUND(3.14, 1) |
Rounds 3.14 to one decimal place |
3.1 |
ROUND(12.34, -1) |
Rounds 12.34 to one decimal place to the left of the decimal point |
10 |
ROUND(876.5, -3) |
Rounds 876.5 to the nearest 1000 |
1000 |
Finds a text value and replaces it with another text value.
SUBSTITUTE(<search within>, <search for>, <replace with>)
Note:
Remove dashes from an ID value.
ID |
SUBSTITUTE('ID', "-", "") |
QUAN-1234-785 |
QUAN1234785 |
QUAN7436528 |
QUAN7436528 |
Returns characters from a value, given starting position (where first character is 0) and number of characters.
SUBSTRING(<column>, <start position>, <no. of characters>)
ID |
SUBSTRING('ID', 4, 2) |
12345678 |
56 |
abcdefghijkl |
ef |
Removes extra spaces from text: leading and trailing whitespace and repeated spaces between words.
TRIM(<column name or text value> … )
Converts text to uppercase.
UPPER(<column name or text value> … )
Date functions operate on cleansed date values which are in Basic ISO format
See Working with dates to learn more about how Quantemplate processes dates.
Returns a new basic-ISO formatted date by adding or removing a term (constant value or column reference) from a basic-ISO date.
DATECALC(<iso date>, <term (numeric value)>, <unit>)
Premium |
DATECALC('Premium', -1, "year") |
DATECALC('Premium', 6, "months") |
20120629 |
20110629 |
20121229 |
20200229 |
20190228 |
20200829 |
"" |
"" |
"" |
2012-06-29 |
(pipeline error) |
(pipeline error) |
Returns the number of whole elapsed units between two basic-ISO dates.
DURATION(<start>, <end>, <unit>)
Start |
End |
DURATION('Start', 'End', "days") |
20120629 |
20120630 |
1 |
20120630 |
20120629 |
-1 |
"" |
"" |
"" |
2012-06-29 |
2012-06-30 |
(pipeline error) |
Moves a date to the last day in a quarter.
ENDOFQUARTER(<iso date>)
Moves a date to the last day in a month.
ENDOFMONTH(<iso date>)
Moves a date to the last day in a year.
ENDOFYEAR(<iso date>)
Moves a date to the first day in a quarter.
STARTOFQUARTER(<iso date>)
Moves a date to the first day in a month.
STARTOFMONTH(<iso date>)
Moves a date to the first day in a year.
STARTOFYEAR(<iso date>)
Certain combinations of letters and numbers will be interpreted as numbers in arithmetic operations.
The most useful of these is the exponent indicator ‘e’, so a string ‘5e5’ would be interpreted as the number ‘500,000’ when used in an arithmetic expression.
Characters f, F, d and D indicate the type of floating-point number. F or f indicates the number is a float; otherwise its type is double and it can optionally be suffixed with a letter D or d.