Thursday, January 12, 2012

Thoughts on what to put in the database

It seems if you ask three database developers what business logic should be put in the database, you will get (at least!) three answers.  Having read Andrew Dunstan's blog post on the subject, as well as Tony Marston's advocacy pieces of putting no business logic in the database, I will give my viewpoint.  Here it is:
All logic pertaining to storing, manipulating stored data, retrieving, and presenting data in a relational format belongs in the database provided it can be reduced to atomic calls, but all logic of providing the data to the user, accepting data from the user, and providing workflows belongs in upper layers.
Really that's it.  If you want to send an email, don't do that in the database (there are a billion reasons why). If you want to generate HTML documents, the database would not be my first choice of where to put it.  There are a couple other ways to look at this though.  Here are a previous of my view and why I have changed my mind, slightly narrowing the field:

There is a difference between business logic inherent in the data and business logic regarding what you do with the data. Inherent logic belongs in the database.  Use-based logic belongs in the application.
This doesn't quite work in practice as well as it works in theory because it is a bit fuzzy and arguably overinclusive.  Things like converting date formats for example could be argued to be inherent logic, but flooding the database with round-trip calls for this would not be an efficient use of resources.  Consequently this view only works when narrowly interpreted, and where data format instability is inherently seen as use-based logic.   In other words, we end up getting back to something like my current position.

The second issue is that this view is slightly underinclusive.  If I want to ensure that an email is always sent when certain criteria are met, that logic has to be tied to the database, and there is a correct way to do this in PostgreSQL (see below).  The actual email would not be sent from the database, but information regarding sending it would be stored and made available on transaction commit.

Remember the goal here is to have an intelligent database which exists at the center of the application environment, providing data services in a flexible yet consistent way.

Back for LedgerSMB 1.2, I wrote a sample utility that would listen to updates and where a part would drop below its reorder point, would send out an email to a specified individual.  This is the correct approach for a couple of reasons, but the utility still had significant room for improvement.   There are two specific problems with the approach taken here:

  1. The trigger was supplied as part of the main database setup scripts which meant that since the trigger logic was incomplete (see below), the utility would necessarily send out duplicate information on each email.   Such utilities should actually be self-contained and supply their own triggers.
  2. The trigger was inadequate, simply raising a NOTIFY when a part became short.  The trigger should have taken the information from the new row and inserted it into a holding table which the email script could then clear.
So if we look at the way something like this should function transactionally, we get something like this:
  1. Invoice issued, fewer parts onhand than in reorder point.  Take summary of information regarding warning, store it in a queue table for email.
  2. Raise a notification.
  3. Invoice transaction commits, making the notification visible to the listening application.
  4. Listening application receives notification, checks queue table,
  5. Prepares and sends email, deleting records from queue table as entered.
  6. Checks again for new records, Commits transaction, and checks for new notifications.
This means you have two asynchronous loops going on, coordinated by transactional controls on the part of the database.   The invoice could also be queued, and another loop could be used to generate a printed document to be printed automatically and sent out to the customer.  All manner of other things automatically done in such a way that other scripts which import invoices could do so without interrupting any of these loops.  Moreover these can be added without disturbing the original application, simplifying testing and QA.

No comments:

Post a Comment