Monday, February 28, 2011

Working around MySQL dain bramange

I'm currently in the process of porting a perl app that was written for MySQL over to a PostgreSQL backend. Originally it was targeted for MySQL 3.x (yes, the app is that old), long before any kind of transactions or constraints were available, so data updates use this general idiom (those of you used to full-featured databases, you may want to avert your eyes):

  1. High level function locks tables
  2. Perform queries on database to check for consistency (uniqueness, business rule checks, etc)
  3. Pass parameters to lower level update function
    1. Perform global permissions checking
    2. Perform single update/insert/delete as appropriate
  4. Call lower level functions additional as needed
  5. Unlock tables

Later on, in the 4.1 days, rudimentary transactions were added, though constraints were still nonexistent to useless. Simple transaction support was shoehorned in by adding it to the lower level functions.

  1. High level function locks tables
  2. Perform queries on database to check for consistency (uniqueness, business rule checks, etc)
  3. Pass parameters to lower level update function
    1. Begin transaction
    2. Perform global permissions checking
    3. Perform update/insert/delete as appropriate
    4. Commit or rollback transaction
  4. Unlock tables

(I warned you not to keep reading, didn't I?)

So let's recap where we are now, thanks to sticking with what Selko describes as "not really a relational database, but does sequential processing behind a 'near-SQL dialect' language":

  • No constraints in the database, making direct edits (and let's face it, they always happen sooner or later) are quite dangerous.
  • Because there are no constraints, all of the checking has to be done in the application layer.
  • Because the checking is done in the application, all involved tables still have to be locked - MVCC would actually make concurrency issues worse, not better.
  • Because transactions are handled in the lower level functions, they can't encompass multiple statements. What's the point? (Okay, to be fair they do cover auditing records that the lower level functions silently create, but that's all.)

Ugh. The ideal path going forward would be move all of those constraints down into the database where they belong. This is far from a trivial task, however, and one that I probably don't have the resources for. The checks are scattered throughout the codebase, with odd conditions awkward to express in SQL. One common idiom is a column that references table A if the value is positive, table B if negative, or nothing at all if its zero. If I could wave my magic keyboard, that's the road I'd go down, but it's out for repair, and I need a working app sooner rather than later.

Instead, I'm going to have to spring begin/commits throughout the higher level functions, while still preserving the table locks everywhere. To make things even better, some of the higher level functions may call each other, so I need to simulate nested transactions, which PostgreSQL doesn't quite support (yeah, I know about savepoints.) So now I'm looking at something roughly like this:

  1. If not already in a transaction, begin one.
  2. Lock tables.
  3. Perform checks.
  4. Call low level functions, or other higher level functions as needed.
  5. If we began our own transaction, commit, else pass control back up.

Yeah, it's still pretty fugly, but that's the design tax we're still paying for all of the features left out way back in MySQL 3. Thanks, guys.

No comments: