It’s often necessary to fill unpopulated columns with default values. There are a few ways to do this:
Enter the column names as new fields and the desired value within the field.
An IF statement can be used to determine if the column is already populated and return a default value if not.
This approach has the advantage that the whole array can be copy-pasted to another Calculate operation in another pipeline if desired. Note that there are a maximum of 51 calculations in a single Calculate operation.
Use the Populate function.
If the column does not exist in the source data, this will add it and populate it with a value. It can either populate all new blank columns at once with the same value, or add different values to specific columns within a source input.
Note that if a new source file is brought in, then populated values for that file will be lost, so it needs to be done at a point in the pipeline with consistent inputs, e.g. when mapping to an output schema.
This operation adds and populates a single column. It’s useful where one column is sometimes missing in a group of otherwise identical input files. Using Append If Missing prior to map column headers means only one schema needs to be mapped in.
One way to manage default values at scale is to have a central reference table with all the default values for all the data producing parties, so ‘ABC-MGA’ would have one row, ‘Acme-MGA’ another. The dataset can be centrally maintained, or connected via API. It requires a couple of additional steps prior to the file import stage, as follows:
Input: ‘REF: Defaults table’ (this should have the same headers as the interim schema)
Remove Rows where ‘MGA name’ column does not equal "Acme-MGA"
Inputs: Monthly BDX spreadsheets, ‘REF: Master Headers’
Remove rows and detect headers and map to interim schema as normal
Inputs: Stage 1 and Stage 2 Outputs
You will now have a single file, consistent with interim schema. Columns with default values have the default value in the first row only.
If only one input file will be processed at once, then this Union stage can be move to Stage 2. See above.
Inputs: Stage 3 Outputs
Fill Down the columns which have default values, to fill out the rest of the column.
Use a Remove Rows to strip out the single row taken from the Default Values file (e.g. Remove rows where “Policy number” is empty).
This approach works best when the default value columns are not present in the uploaded files. If some of the columns are present, their values will be retained, though any blanks in these columns will be filled down from the values above. Therefore, in this situation the reference dataset for default values should be brought in using a join and the values populated from the joined-in values using an IF statement in Calculate.