PIPELINES
- Introducing pipelines
- Pipelines workspace
- Managing pipelines
- Inputs and outputs
- External sharing
- Join and Union
- Dates
- Automap Values
- Validation
- Operations
- Syntax and functions
- Updates and legacy

DATA

ANALYSE

ADMIN

TUTORIALS
- Pipelines
- Analyse

Remove Rows

The remove rows operation allows the removal of rows which match certain criteria. The operation can be used to remove rows which are not part of your core data, typically surplus headers and footers, totals, or rows containing specific values.

Usage

To add criteria either click the + button on the top right or use the quick add link on the main panel.

Once added, criteria can be removed by hovering the cursor over the item and clicking the cross button.

There are four criteria by which you can remove rows:

Duplicate rows
Empty first column
Less than [n] columns
Column filter

Duplicate rows

Removes any rows that have identical values to any preceding row.

In this example rows 3 and 6 duplicated rows above, so were removed:

1	A	A	A	A
2	B	B	B	B
3	A	A	A	A	Removed
4	C	C	C	C
5	D	D	D	D
6	B	B	B	B	Removed

This option may cause long execution times for inputs with large numbers of rows so should be used sparingly.

The ‘Remove duplicate rows’ function can only be added to the operation once.

Empty first column

Removes any row where the first column is empty.

In this example the first columns of the first two rows were empty, so were removed:

1		A	A	A	Removed
2		B	B	B	Removed
3	C	C	C	C
4	D	D	D	D
5	E	E	E	E
6	F	F	F	F

The ‘Empty first column’ function can only be added to the operation once.

Less than [n] columns

Counts the number of columns in the row that are not empty, i.e. they have a defined value. If the count is less than the number specified then the row is removed.

In this example, the criteria was configured to remove rows with less than 3 columns of data. Rows 1, 2 and 6 contained less than 3 columns of data, so were removed:

1	A	A			Removed
2	B	B			Removed
3	C	C	C	C
4	D	D	D	D
5	E	E	E	E
6				F	Removed

The ‘Less than [n] columns’ function can only be added to the operation once.

Column filter

Applies a filter to remove rows depending on the values contained within a column.

In this example the column filter was set to remove rows where the value in the column ‘State’ was not “AZ”. Rows 1 and 5 had different values, so were removed.

	State	Buidings value	Interruption value
1	CA	815000	25000	Removed
2	AZ	50000	150000
3	AZ	180000	50000
4	AZ	20000	30000
5	TX	20000	100000	Removed

Column filters can be added to an operation multiple times to refer to different columns and values.

Combining rules

When you have defined multiple criteria you can choose to remove rows where any criterion applies, or where all criteria apply.

Example

It is common for data extracted from a third-party system to start with some summary data about the content, and to include totals as the last row.

1	Property Locations
2	Number of locations	86
3	TOTAL INSURED VALUE	35468000
4		Buildings Value	Contents Value	Interruption Value
5		0	0	25000
6		0	0	150000
7		815000	0	50000
8		50000	0	30000
9		180000	0	100000
10		20000	0	150000
11		20000	0	50000
12		50000	0	30000
13	Total	401500	0	585000

The summary info typically has far fewer columns of data than the body, so the ‘Less than [n] columns’ function can be used remove these rows.

When the filter is added it defaults to removing rows with fewer than three columns. This means that any row with zero, one or two cells populated will be removed. This value should be set to be more than the summary column count but fewer than the column count in the data.

In this example, if we leave the count at 3 then the summary rows will be removed. This will leave the data plus the total row, which can be removed with a column filter.

1		Buildings Value	Contents Value	Interruption Value
2		0	0	25000
3		0	0	150000
4		815000	0	50000
5		50000	0	30000
6		180000	0	100000
7		20000	0	150000
8		20000	0	50000
9		50000	0	30000
10	Total	401500	0	585000

Tip

If you want to remove rows based on values within specific columns before using a detect headers operation, then the columns will have been automatically named “unnamed”, “unnamed (0)” etc. This allows you to pick the column based on its position in the file.

To remove the total row, first add a column filter. Since the column headers have not been defined yet, to select the first column choose the automatically named column “unnamed”, select the predicate ‘equals’ and enter the value “Total”.

This setup will remove the Summary and Total rows from our example data.

1	Buildings Value	Contents Value	Interruption Value
2	0	0	25000
3	0	0	150000
4	815000	0	50000
5	50000	0	30000
6	180000	0	100000
7	20000	0	150000
8	20000	0	50000
9	50000	0	30000

The data is now ready to have its column headers defined and to undergo further processing.
The blank first column can be removed via the Map column headers operation.

Help Centre