Tuesday, March 24, 2009

So long, and thanks for all the transactions...

Well, it's been a long time coming, but it's finally happening. The breakup will take quite a while, and it may never be a complete break. But in the end, after hundreds of tables and countless rows, I begin the process of replacing MySQL with PostgreSQL as our primary database.

But why, you ask? Not for any of the reasons you're probably thinking of. It's not because of any of the limitations of any of the foreign key implementations. It's not because of the cases where MySQL's serial data processing nature conflicts with what set theory says you should be able to do, such as modifying a table that's referenced again in a subquery. It's not because PostgreSQL is substantially closer to fully standards compliant SQL, and therefore a lot of the commercial big names such as Oracle. And no, we never had any catastrophic data loss due to any less than ACID aspects or a MySQL crash.

No, overall MySQL has been quite good to us. We're switching for the simple and inescapable reason that, unlike PostgreSQL, MySQL can not store IPv6 addresses in a usable form.

Now before you come up with some bright idea, take my word for it - I've already looked at it, thought it through, and utterly rejected it. Once you strip away the slightly funky formatting conventions, an IPv6 address is at its core simply a 128 bit integer. Too bad that MySQL only supports up to 64 bit integers, meaning that the IPv4 method (store it in a 32 bit integer) doesn't work. You can stick it in a text string, but then you get formatting normalization issues, and you can't efficiently perform any bit masking operations, which are required for things like determining which subnet a given address belongs to. Cramming it into a decimal type is a little bit closer, but sadly the bitwise operators such as '|' and '&' will silently truncate the output to 64 bits, making it - and anything else - utterly useless.

Quite simply, as comfortable as we are with MySQL, IPv6 addresses just don't fit.

And then, like a ray of sunshine, there's PostgreSQL! It actually has a pair of native data types (inet and cidr) that can not only cleanly store either an IPv4 or IPv6 address and subnet mask or prefix length, but also work with a number of functions for some of the most common operations, such as extracting the network address or calculating the broadcast address. What is currently a flat out impossibility in MySQL is not only possible, but trivial in PostgreSQL.

My group is responsible for maintaining the network here. We live, breathe, and die by IP addresses. We register them, track them, and shuffle them around between databases, DNS, DHCP, and ACLs a hundred times a day. Imagine trying to sell a database to the phone company that can't store phone numbers, and you'll have a pretty decent feel for the position we're in with IPv6 addresses. IPv6 adoption may be slow now, but it's going to come sooner or later, and when it does come, it's probably going to hit critical mass and come fast and hard.

Sure, I'm not looking forward to the changeover. I need to worry about removing MySQL specific features from the schema, such as auto increment fields and set/enum data types. I have several gigs of absolutely mission critical data that has to be moved over without getting scrambled. And there are many, many lines of perl and SQL code that have to be tested thoroughly. But in the end, "difficult but possible" beats "not a chance" any day.

Thursday, March 19, 2009

Patents Vs. Innovation

If you follow the tech industry at all these days - and possibly even if you don't - you can't help but hear all about patents. Whether it's Microsoft suing TomTom over ridiculous FAT patents, or RedHat reluctantly assembling a defensive patent arsenal, the one thing that every agrees on is that patent law and practice has a huge impact on both the tech sector, and the economy at large.

One aspect that the guys over at Techdirt have continued to hammer at is the difference between invention, and innovation. Patents place a significant emphasis at protecting the "rights" of the inventor (rights which exist only because of patent law - how's that for circular reasoning for you?), to the point where innovation, the act of making the invention actually useful, is being harmed. I've come up with a simple analogy that I think helps clarify this argument.

Imagine that you're a manager with a couple dozen employees under you. (If you actually are a manager, this should be pretty easy.) Now, odds are that most of your employees are pretty decent. They're good at their jobs, you can rely on them to produce good quality work, but they don't often come up with game changing ideas.

Except for that one guy. You know, that one guy who, if you had to pick your replacement, you'd name in a heartbeat. He's the one who doesn't just come up with a way to make a work process faster, he shuffles things around and makes that entire process disappear, freeing up everyone's time to work more on things that make money.

Now picture he's come up with a new version of some form that everyone has to fill out fifty times a day. The new version is faster, more accurate, and will allow everyone to increase their client billable hours by 20%. Great! Everyone can start using it, the company makes more money, everyone gets raises, and everyone's happy.

Except that in this case, we're modeling our little company after the patent situation we're in. Now, the only person who can use the new form is your one go to guy. Everyone else is stuck with the old form. The company bottom line barely moves, no one gets the big raise, and everyone gets to just watch the one guy doing interesting work while they spend their day filling out the form they're not allowed to use because they didn't come up with it.

So I ask you this: as the manager, which set of rules would you rather try to run a company under?