Friday, February 29, 2008

Windows World (Slowly) Learning From Unix History

The excellent Coding Horror blog has a short article up about one way of categorizing software: UsWare vs. ThemWare. The idea is simple enough. ThemWare is software that's only used by "Them" - ie, none of the users are also developers of it. UsWare is software that is used by the developers as well as others.

Jeff comes to the conclusion - which I happen to agree with 100% - that creating software as UsWare will, all other things being equal, lead to vastly higher quality than software created as ThemWare. To help this process along, he encourages his software developer readers to work to gain the user perspective, to eat their own dogfood. This is certainly a good idea.

But as I thought about it a bit, I realized that this is only half the picture. The focus here is to give the programmers more of a user perspective so they meet user needs better. But what if things could go the other way? What if users could get more of a programmer perspective, so they actually could communicate their needs effectively? And maybe, in the case of users who have some programming experience, be allowed to help out and contribute bits of code that demonstrate what they want with far more precision than any prose description. Either way, the end result is to break down barriers, and blur the line between developer and user.

Oh, wait. There's a name for that already - open source.

That phenomenon called open source software hasn't really caught on too strongly in the Windows world, in no small part because Microsoft does everything in its power to keep all of its source code under heavy lock and key. With how much Microsoft depends on license keys to enforce paying for software, there really isn't much of an alternative for them. Even more important, I believe, is the fact that Microsoft began from square zero by selling software to non-programmers. The people using those original DOS systems didn't want computers for their own sake, they just wanted them to run their business.

In the Unix world, however, things began completely different. While Microsoft was busy trying to sell computers to people who didn't want to know anything more about them than they had to, Unix was a programmers playground. Researchers used Unix, and often had to create their own applications, and were able to with compilers being commonplace on Unix systems. Unix was an environment created by programmers for programmers, and the result is that once you begin to feel a little comfortable as a Unix user, the bar to becoming a Unix programmer is fairly low.

As the Free Software and OSS movements had propelled Linux systems as the successor to the Unix heritage, this trend has only become more pronounced. These days, a typical Linux system will have two or three programmer-friendly editors, an IDE, compilers for C, C++, and possibly Fortran, lisp (if you count Emacs), java (Sun, or an open source alternative), and a handful of powerful scripting languages such as Perl, Python, and Ruby.

And that's just the typical stuff! For the Linux user truly interested in becoming a programmer, there are debuggers, Ada, Smalltalk, Rexx, Haskell, and countless other languages and development aids just a Freshmeat search away. With all those tools just waiting to be picked up, each and every open source user is a potential contributor, of anything from a bug fix, to feature enhancement, to documentation, all the way up to becoming a full fledged maintainer.

Jeff is absolutely right that programmers who learn what it's like to be users will end up producing higher quality software. But as long as you freeze out your users from becoming contributors, you're throwing away valuable resources that you often couldn't buy if you wanted to. And that's why Linux will always have an edge over Windows, no matter how many animations they add to Aero.

Friday, February 15, 2008

Plan For Failure

Vista "enhancements" include removing the ability to do repair installs. Screw Windows up a little too badly, and your only option is to reformat and reinstall.

Rim has an undisclosed problem with servers off in Canada, and suddenly every Blackberry everywhere goes offline.

Congress starts ramping up surveillance and blanket data retention, but never seems to worry about the fact that those same tools are equally useful for criminals.

What do these three disparate events all have in common? Simple. All of the design was built around what happens when things go right, not wrong. All three cases display a horrific lack of preemptive failure analysis.

Failure analysis is something that is taught to more established professions, such as mechanical or civil engineering. In these professions, where a screw up frequently can mean people die, worrying about when happens when - not if - something breaks is beaten into students until they think about it the way a deep sea diver thinks about his air supply.

When a civil engineer designs a bridge, he can easily end up putting in thousands of pieces. Some pieces, when they fail, are rather unimportant. If the dedication plaque rusts or falls off, a donor may be upset, but the operation of the bridge isn't compromised. On the other hand, if a rivet or weld holding a support in place cracks, then the engineer who signed off on the design is going to be very interested in what will happen. Will the bridge hold for a year? Six months? A day?

Every part has an MTBF. Just as important as knowing when that part is likely to fail is every bit as important as knowing what will happen when it does fail. Often times, an early analysis can find hidden critical dependencies that can be fixed or mitigated with simple design changes.

Take the Vista removal of recovery restores. Strictly speaking, removing this feature didn't add any failure modes. Unlike a new driver or filesystem, it didn't add any new ways for an existing Windows system to break. What it does, is ensure that once a failure beyond a threshold does happen, the impact will go from being recoverable, to being a death sentence for that copy of Windows. Without adding any new failure modes, the number of critical failures just went up.

Now if you ask the people who put these systems together, I highly doubt that they intended for these systems to fail. This seems obvious... But it's also the problem.

Every system out there will have a failure sooner or later. Let's be fair to Microsoft, by giving them a plus side. All Blackberries have their data go through Rim servers, despite having a perfectly good data connection from the cell provider. This adds a wonderful single point of failure. By contrast, Microsoft based smart phones don't need any such assistance. They're perfectly capable of talking on their own, without an extra translator.

Microsoft could take their entire infrastructure offline, and the phones wouldn't care. By keeping their own servers out of the data path, they've reduced the number of failure modes of Windows Mobile phones out in the wild.

If we programmers and IT guys want to be taken seriously, we absolutely have to start planning for failure. Throwing redundant servers at problems reduces the likelihood of failure, but doesn't reduce it to zero. RAID protects you against a single hard drive failure, but not multiples.

We have to start asking ourselves, with each and every component we build or install, what will happen when this system breaks? That's how you notice things like a pair of high end servers both plugged into the same $4.95 ValuePak power strip. That's how you put in exception handlers that, when that exception that can't possibly happen happens, at least ensure the program goes down gracefully instead of exploding with a corrupted database.

That's how we can start building systems where a single, simple stupid failure doesn't turn into a headline generating, career limiting fiasco. Then maybe those civil and ME guys will stop snickering whenever one of us calls himself a software "engineer".

Thursday, February 7, 2008

WTF is Google Thinking?

Google. The projects they do, the reactions they provoke, even the cooking in the cafeteria - whatever they do, almost always ends up being big. Unfortunately, with their latest "It seemed like a good idea at the time!" they're most likely about to piss of even more IT staff than when Google Desktop started copying files onto Google servers indiscriminately.

The description from the press release sounds innocuous enough:

Google (NASDAQ: GOOG) today announced Google Apps Team Edition as the simplest and fastest way for groups of employees and students to collaborate within an organization using Google Apps.

But then they go on:

Once users verify their business or school email address, they can instantly share documents and calendars securely without burdening IT for support.

ARS Technica had it right when they described this as Google trying to "sneak Team Edition suite past IT help desk". To those IT help desks Google is referring to, this is roughly like working to bring new an exciting drugs to market without burdening the FDA, or opening a new restaurant without burdening those poor health inspectors.

The problem is, Google is offering to host some set of end user data, but those end users quite simply lack the ability to evaluate whether or not Google is a suitable custodian of that data. Random end users shouldn't be expected to make those kinds of evaluations on their own. After all, why should an accountant worry about going over technical details of colocation and outsourcing details, such as key escrow management, encryption, etc, when you already have an IT department to worry about them?

In any decent sized company, this is how things are supposed to work. The business side of the house sets the priorities, then passes the goals and requirements off to the IT of the house, who picks the best solution on suitability and technical merit. Management sets the why and what, IT decides the how.

Google, on the other hand, appears to be trying to take that away. Now, I'll be the first to say that expanding the online Google tool suite is great. And adding in collaboration features is a pretty obvious next step.

But damnit all, Google has a responsibility to make sure this loaded gun is at least pointed in the right direction! If you want to sell liquor, fine - but that doesn't mean you should open up shop across the street from a high school. The last story that I heard of where users decided to go off and create a working solution on their own, the end results included an SSL free commerce web site and credit card numbers were tossed around in plain text email to be typed in. Collaboration definitely sounds like a powerful tool in the right hands, but IT still has to have a prominent role in picking which tool to use and how to use it.

Now I'm sure that the good folks at Google never intended to have sensitive data, like business plans or credit card numbers, passed around. The problem is, to an ordinary user, only moderately technically literate, the only difference between storing that top secret business plan on a secured server and Google docs is which bookmark they click on.

In a a managed corporate IT environment, the IT and business sides of the house have a close working relationship. The IT side understands enough of the business side to create a working system. At Boeing, the IT staff understand that plans for new airplanes are highly sensitive, and so can set up servers and encryption to protect it, and train users in how to use it to protect data. With Google, however, you get what they offer, and that's it. If Google apps doesn't meet your needs, you either end up with a hole that Google apps can't fill, or even worse, leaving data inadequately protected.

So the next time that someone who has no chance of understanding the implications of the fine print in the acceptable use policy goes off and leaks the company crown jewels by clicking the wrong checkbox in a Google app, will Google accept any of the blame? Or even more importantly, any of the responsibility of cleaning up the resulting mess? Tracing the extent of data leaks? Buying credit protection for identity theft victims?

Somehow I suspect that Google won't mind burdening the IT help desk with that half of the job.