Wednesday, December 19, 2007

Coding and Complexity

First off, let me just make a quick confession - while my undergraduate degree was stamped with "Computer Science" as my major, I don't really consider myself to primarily be a programmer. Sure, I do actually spend a good number of my days mucking around with writing code (usually Perl, occasionally Ruby), but my job is really IT support, specifically networking. I deal with switches, routers, wireless, VPN, and a handful of Linux servers supporting the network with DNS, DHCP, etc. The code writing that I do is almost exclusively to support everything else, such as working on a host registration system or device monitoring scripts. The software I write is to directly address a need, rather than to be sold to address someone else's need.

That said, when I just read Steve Yegge's latest rant, Code's Worst Enemy, it struck a chord with me.

I happen to hold a hard-won minority opinion about code bases. In particular I believe, quite staunchly I might add, that the worst thing that can happen to a code base is size.

Now, as someone who does not consider himself a spectacular coder by any means, I would certainly feel quite daunted by tackling a 500k line codebase by myself. On the other hand, as a professional coder, Stevey ought to be able to casually fling around great swaths of code, using advanced software repositories and indexing tools, right? But no - he feels that, all other things being equal, less is more.

One feature of large code bases that I think he gave short shrift to was the idea of complexity. He talks a little bit about how complexity certainly makes a given code base harder to work on, and also that some of the automatic tools, such as refactoring, that try to deal with it just make the problem worse by bloating the code base even more.

This is something significant in his argument, I think. In this example, we have two code bases, before and after being run through the automatic refactoring tool. The initial state has a given level of functionality, size, and (for lack of a better word), "goodness". The final state greater size, and therefore less goodness, but identical functionality! This mirrors his stated goal of taking his existing game, and rewriting it with identical functionality but less than half the lines of code.

I think the explanation boils down to this: we can only fit so much in our brains at a time. Great programmers can mentally swap in more of the big picture at once, but everyone has their limit. This limit is why we decompose programs down into manageable subroutines, each of which can be understood (at least partially) in isolation from the rest. It is why we hide massive chunks of functionality behind a handful of calls into a library. The smaller chunk size we're working on, the more likely we are to be able to fully understand it and not screw up.

From here, the trick to making sense of Stevey's size argument is realizing that there are two completely different kinds of complexity at play here. If you're writing code to do, say, an FFT, you've got to know the math behind it and how it works. That's a fair bit of complexity that you've got to hold in your heard, and it's going to remain constant regardless of whether you're developing in Java, Ruby, C++, Assembly or BF.

This invariant portion of the complexity is what I call inherent complexity. (Please don't tell me if that term isn't original; I know if probably isn't, but I like to pretend.) It's the piece that you can't get away from, since it's what defines the actual problem you're trying to get that hunk of copper and silicon to solve for you. It's the tax code embodied in Quicken, the rules of mathematics in Mathematica, the graph theory in Garmin and TomTom. Remove the inherent complexity from a problem, and all you've got left is a very complex, boring video game with executables instead of high scores and compiler errors instead of health damage.

If the inherent complexity were all there was to it, then knowledge of the problem domain would be all that's required. You wouldn't need a programmer to write Mathematica, just a mathematician to sit down and tell the computer everything she knows about math. Easy, right?

Sadly (or fortunately, if you make a living as a programmer) this is not the case. The person coding has to know extra details that are outside of the problem domain, like the fact that the number 0.1 cannot be represented with absolute precision in a floating point number. Or that if you accidentally tell a computer to loop forever, it will do so. Or that each of these three different sort routines will produce the same final product, but the memory and time requirements can vary by an order of magnitude or more - and not always in the same order, depending on the data set. Not to mention nitty language details, like dealing with pointers in C or "bless" in Perl.

All of these other layers upon layer of crap that gets wrapped around the real problem is just extraneous complexity. I mean, let's be honest - learning objected oriented design or unit testing may help you write code faster and with fewer bugs, but won't help with bullet point one of the design requirements for an ERP (or online order system, or factory automation, or... ). It's all work that is, in the end, unquestionably important to creating a finished product, but any time spent working on that extraneous complexity is time not spent on the inherit complexity.

Or, to put it more bluntly, any time you spend appeasing your programming environment is time that you're not spending on solving the actual problem.

Based on this, the best development languages are ones that are fairly thin, succinct, and in general just get the hell out of your way and let you work. Go back a few decades, and compared to the alternatives of the time, this is what C was. The book that was for many years the definitive guide to C was under 300 pages long, and let the programmer almost completely ignore the messy details of things like programming in assembly. Loops and conditionals were suddenly a simple, easy mnemonic syntax.

More recently, I think this "thinness" is a huge portion of the success of Ruby on Rails. Starting from a database schema, you can literally create a functional skeleton application in minutes with just a few commands, with all of the components already laid out neatly organized and slots already created for niceties such as porting to different databases, unit testing, and version control.

Sure, it's all stuff that any competent programmer can easily handle, but automating it frees up that many more brain cells to do whatever it is the client or employer wants to give you money for.

No comments: