Remove Rows
The remove rows operation allows the removal of rows which match certain criteria. The operation can be used to remove rows which are not part of your core data, typically surplus headers and footers, totals, or rows containing specific values.
Usage
To add criteria either click the + button on the top right or use the quick add link on the main panel.
Once added, criteria can be removed by hovering the cursor over the item and clicking the cross button.
There are four criteria by which you can remove rows:
- Duplicate rows
- Empty first column
- Less than [n] columns
- Column filter
Duplicate rows
Removes any rows that have identical values to any preceding row.
In this example rows 3 and 6 duplicated rows above, so were removed:
1 |
A |
A |
A |
A |
|
2 |
B |
B |
B |
B |
|
3 |
A |
A |
A |
A |
Removed |
4 |
C |
C |
C |
C |
|
5 |
D |
D |
D |
D |
|
6 |
B |
B |
B |
B |
Removed |
This option may cause long execution times for inputs with large numbers of rows so should be used sparingly.
The ‘Remove duplicate rows’ function can only be added to the operation once.
Empty first column
Removes any row where the first column is empty.
In this example the first columns of the first two rows were empty, so were removed:
1 |
|
A |
A |
A |
Removed |
2 |
|
B |
B |
B |
Removed |
3 |
C |
C |
C |
C |
|
4 |
D |
D |
D |
D |
|
5 |
E |
E |
E |
E |
|
6 |
F |
F |
F |
F |
|
The ‘Empty first column’ function can only be added to the operation once.
Less than [n] columns
Counts the number of columns in the row that are not empty, i.e. they have a defined value. If the count is less than the number specified then the row is removed.
In this example, the criteria was configured to remove rows with less than 3 columns of data. Rows 1, 2 and 6 contained less than 3 columns of data, so were removed:
1 |
A |
A |
|
|
Removed |
2 |
B |
B |
|
|
Removed |
3 |
C |
C |
C |
C |
|
4 |
D |
D |
D |
D |
|
5 |
E |
E |
E |
E |
|
6 |
|
|
|
F |
Removed |
The ‘Less than [n] columns’ function can only be added to the operation once.
Column filter
Applies a filter to remove rows depending on the values contained within a column.
In this example the column filter was set to remove rows where the value in the column ‘State’ was not “AZ”. Rows 1 and 5 had different values, so were removed.
|
State |
Buidings value |
Contents value |
Interruption value |
|
1 |
CA |
815000 |
0 |
25000 |
Removed |
2 |
AZ |
50000 |
0 |
150000 |
|
3 |
AZ |
180000 |
0 |
50000 |
|
4 |
AZ |
20000 |
0 |
30000 |
|
5 |
TX |
20000 |
0 |
100000 |
Removed |
Column filters can be added to an operation multiple times to refer to different columns and values.
Combining rules
When you have defined multiple criteria you can choose to remove rows where any criterion applies, or where all criteria apply.
Example
It is common for data extracted from a third-party system to start with some summary data about the content, and to include totals as the last row.
1 |
Property Locations |
|
|
|
2 |
Number of locations |
86 |
|
|
3 |
TOTAL INSURED VALUE |
35468000 |
|
|
4 |
|
Buildings Value |
Contents Value |
Interruption Value |
5 |
|
0 |
0 |
25000 |
6 |
|
0 |
0 |
150000 |
7 |
|
815000 |
0 |
50000 |
8 |
|
50000 |
0 |
30000 |
9 |
|
180000 |
0 |
100000 |
10 |
|
20000 |
0 |
150000 |
11 |
|
20000 |
0 |
50000 |
12 |
|
50000 |
0 |
30000 |
13 |
Total |
401500 |
0 |
585000 |
The summary info typically has far fewer columns of data than the body, so the ‘Less than [n] columns’ function can be used remove these rows.
When the filter is added it defaults to removing rows with fewer than three columns. This means that any row with zero, one or two cells populated will be removed. This value should be set to be more than the summary column count but fewer than the column count in the data.
In this example, if we leave the count at 3 then the summary rows will be removed. This will leave the data plus the total row, which can be removed with a column filter.
1 |
|
Buildings Value |
Contents Value |
Interruption Value |
2 |
|
0 |
0 |
25000 |
3 |
|
0 |
0 |
150000 |
4 |
|
815000 |
0 |
50000 |
5 |
|
50000 |
0 |
30000 |
6 |
|
180000 |
0 |
100000 |
7 |
|
20000 |
0 |
150000 |
8 |
|
20000 |
0 |
50000 |
9 |
|
50000 |
0 |
30000 |
10 |
Total |
401500 |
0 |
585000 |
Tip
If you want to remove rows based on values within specific columns before using a detect headers operation, then the columns will have been automatically named “unnamed”, “unnamed (0)” etc. This allows you to pick the column based on its position in the file.
To remove the total row, first add a column filter. Since the column headers have not been defined yet, to select the first column choose the automatically named column “unnamed”, select the predicate ‘equals’ and enter the value “Total”.
This setup will remove the Summary and Total rows from our example data.
1 |
|
Buildings Value |
Contents Value |
Interruption Value |
2 |
|
0 |
0 |
25000 |
3 |
|
0 |
0 |
150000 |
4 |
|
815000 |
0 |
50000 |
5 |
|
50000 |
0 |
30000 |
6 |
|
180000 |
0 |
100000 |
7 |
|
20000 |
0 |
150000 |
8 |
|
20000 |
0 |
50000 |
9 |
|
50000 |
0 |
30000 |
The data is now ready to have its
column headers defined and to undergo further processing.
The blank first column can be removed via the
Map column headers operation.