Reliable Software Design

Avoiding Race conditions

A typical situation in ERP systems is, that you have to use SQL tables as an „interface“ between applications.
Let’s say you want to submit an order. You write an „order header“ containing the customer number etc and orderlines how much to ship of what.

order = {'customer_no': 12345,
         'delivery_date': '2001-01-01',
         'orderlines': [{'item_no': '10105', 'quantity': 80},
                        {'item_no': '14600', 'quantity': 12}]
        }

In SQL you use two tables and add some primary keys:

order_head:

id   | customer_no | delivery_date | processed
-----+-------------+---------------------------
5123 | 12345       | 2001-01-01    | False

orderline:

orderid | item_no | quantity
--------+---------+----------
5123    | 10105   | 80
5123    | 14600   | 12

Since oldschool databases usually don’t have auto incrementing keys, you usually have to write something like this to insert an order:

next_id = query('SELECT MAX(id) FROM order_head')
query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', '2001-01-01', False)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '10105', 80)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '14600', 12)" % next_id)

This has some issues: what if some programm starts reading after you have written order_head and the first orderline, but not the second one? The ustomer would not get the goods from the second orderline. Transaction in modern SQL databases can solve this, but if you don’t have transactions you can solfe this problem in otehr ways. Let’s assume the code in the ERP for processing this interface looks like this:

for order in query('SELECT * FROM order_head'):
    for orderlines in query("SELECT * FROM orderline
                             WHERE orderid='%d'" % order.id):
        # do something with order and orderline
    query("UPDATE orderline SET processed=True
           WHERE orderid='%d'" % order.id)

This means that orderlines are never read unless there is also an header with the respective ID. So if we write the orderlines first we are save from them beeing read if tey are not ready yet. So by reordering our code we can make it already much more robust:

next_id = query('SELECT MAX(id) FROM order_head')
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '10105', 80)" % next_id)
query('INSERT INTO orderline (orderid | item_no | quantity)
       VALUES(%d, '14600', 12)" % next_id)
query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', '2001-01-01', False)" % next_id)

This already avoids the race condition „reading while writing“. But what if two processes try to write orders at once? Then next_id = query('SELECT MAX(id) FROM order_head') would result in both processes getting the same next_id and this mixing up two orders into a single one.

The easiest solution is to ensure that only one process can ever write orders. This avoids lot’s of trouble. But it is suprisingly hard to enforce that. So by having code which reduces the risk of dual write race conditions we can make our program much more robust.

One approach would be to use SQL functionality like this to insert the head?

query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(SELECT MAX(id) FROM order_head, '12345', '2001-01-01', False)")

This would ensure that no id is used twice but how would we get the ID we generated for our order? SELECT MAX(id) FROM order_head would not help since some other program might have inserted another record in the meantime.

We are here in an situation where we can’t make the code reliable without resorting to some tricks. In this case the processed field is our solution: Teh application reading the database ignores all records where processed == True. So as long as we set processed to True we can use the other fields for whatever we like. In this case we use the date field to temporary store an unique id identifying the record.

So let’s generate a nice 10 character random value:

token = hex((int(time.time() * 10000)
            ^ (os.getpid() << 16)
            ^ thread.get_ident() << 8)
           % 0xFFFFFFFFFF).rstrip('L')[2:]

If this random value is guaranteed to be unique it’s easy to write a robust insertation code.

def insert_order():
    token = hex((int(time.time() * 10000)
        ^ (os.getpid() << 16)
        ^ thread.get_ident() < 1:
    next_id = query('SELECT MAX(id) FROM order_head')
    query('INSERT INTO order_head (id, customer_no, delivery_date, processed)
       VALUES(%d, '12345', %r, False)" % (next_id, token))
    rowcount = query('SELECT COUNT(*) FROM order_head WHERE id = %d' % next_id)
    if rowcount > 1:
       # the race condition has hit - remove our entry and retry
        query('DELETE FROM order_head WHERE delivery_date = %r' % token)
        time.sleep(random.randint()/100.0)
        insert_order()
    else:
        query('INSERT INTO orderline (orderid | item_no | quantity)
               VALUES(%d, '10105', 80)" % next_id)
        query('INSERT INTO orderline (orderid | item_no | quantity)
               VALUES(%d, '14600', 12)" % next_id)
        # fix up the header
        query('UPDATE order_head SET delivery_date='2001-01-01',
               processed=False WHERE delivery_date=%r' % token)

While this code is much more complex than the original, it can guarantee that no race conditions occur as long as token is unique.

3 comments on “Reliable Software Design

  1. janl
    2008-10-16 at 00:07 #

    Nice runthrough of how to avoid race conditions on data stores. Using a UUID is the exact same thing CouchDB does btw.

    What people often don’t see is that auto_increment serves two purposes: unique identification _and_ ordering. If you only need one of these properties, you may want to use something else (timestamps & UUIDs e.g.).

    Cheers
    Jan

    This comment was originally posted on 20080918T08:36:48

  2. Jason Watkins
    2008-11-11 at 04:16 #

    A general approach would be to maintain a locks table using Lamport’s bakery.

  3. mdornseif
    2008-11-15 at 09:07 #

    Unfortunately for that all parts must cooperate – but the legacy ERP system doesn’t.

    For the uninitiated: The bakery is described at http://tinyurl.com/6blw49#.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden / Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden / Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden / Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden / Ändern )

Verbinde mit %s