Tuesday, May 13, 2014

Renumbering rows in SQL

Today I struggled around a seemingly simple problem the would be quite easy to solve in a traditional imperative programming language but is not so immediate to solve in SQL so I want to share my solution.

I have a master table and a master_lines table where lines should be  numbered from 1 to N, where N is the count of lines per master.
For some reason that does not matter in this context, a program inserted lines for a given master leaving large gaps between lines and I had to renumber those lines, keeping them in the same sort order, since I was getting numeric overflow errors in the line_number column which had been unfortunately declared numeric(3, 0). The program was continuing to increase the line_number starting from the highest maximum read from the database and adding nearly random offsets from there. (yeah, an integer column probably would have solved this problem, but I could not change the database).
The user was complaining and I had to apply a quick fix to allow him to proceed entering new lines for that master row.

Since this was a PostgreSQL database I knew I could renumber the lines easily using the row_number window function. 
Window functions are a nice SQL feature that often allows you to solve order dependent problems in SQL. SQL is a set oriented language, it's purpose is to process relations, which are sets, and sets have no predefined order. So it's generally difficult to express order dependent logic in SQL. Window functions can compute values on windows of contiguous rows returned from relational operations applied by a query. They are not relational algebra operators themselves but they allow you to post-process the result of relational algebra operations.

Among the set of window functions, row_number returns an integer starting from 1 to N where N is the cardinality of the result set (or size of the window to be correct). This is exactly the functionality we need. Since the result of a window function depends on the sort order of the result set, we need to specify in which order we want to count the rows. This is achieved adding a "window" clause to the statement, which can also be written inline with the function call. This clause has many features but for our purposes we only need to be able to specify the ordering we need. So we can write 

row_number() over (order by line_number)

to request a sort by line_number and then return the position of the "current row" in the sorted set, counting from 1. To update the rows for a given master I wish I could write:

update master_lines o
set line_number = row_number() over (order by line_number) 
where master_nbr = '14/08925'
;

but this does not work since SQL window functions are not supported in UPDATE statements. Semantically it may have sense but is not allowed. This is one example of lack of orthogonality still present in SQL. In order to solve this I need to perform the operation in two steps, first computing the row number and then using this result to update the master_lines table. I may use an inline view for this but I find it more elegant and readable to use a CTE (Common Table Expression) for this purpose.
A CTE is like an assignment in a functional programming language. If we consider that SQL operands are relations (Chris Date, forgive me for the imprecision, I know I should say that SQL allows duplicates and so on...), a CTE like

with t as (
  query
)

is nothing more than an assigment to the final variable t, of an immutable value as defined by the result of executing the query. It is a relation in the relational model sense (an immutable value). So I can legally reference it inside other parts of the same SQL statement. In our case we can write:

with t as (
  select line_number, 
    row_number() over (order by line_number) as pos
  from master_lines
  where master_nbr = '14/08925'
)
update master_lines o
set line_number = t.pos
from t
where master_nbr = '14/08925'
and o.line_number = t.line_number
;

This solution can be used whenever you want to renumber a set of rows according to some arbitrary sort order.

No comments:

Post a Comment