Sunday, January 13, 2008

CS Majors Need Not Apply

Usually, I like Coding Horror. I just read a post, though, where he argues that CS majors should be taught more software engineering. He quotes CS students who have never been formally exposed in their entire undergraduate program to things that the professional field lives and dies by, such as deployment management and revision control. But then, he goes one step too far, right off the cliff:

If we aren't teaching fundamental software engineering skills like deployment and source control in college today, we're teaching computer science the wrong way.

Sorry Jeff, but I'm going to have to call you on this one. If you're teaching fundamental software engineering skills like deployment and source control, we're not teaching computer science at all - we're teaching software engineering!

Think about cars, and the people who design them. Typically, they went through a Mechanical Engineering degree. While this means that they did get a solid grounding in some underlying physics, such as heat transfer, stress transfer, and material analysis, the focus is on how to apply those areas to the real world. Complicated, precise physics formulas are replaced for approximations and tables designed to quickly and easily give an answer that may be less accurate, but errs on the side of safety.

On the other end are the actual guys who do the real hard-core science - physics. Mostly done on chalkboards and computers, these guys only delve into the real world to gather data or test out a hypothesis. There are quite a few good physicists out there who could explain to you in great detail how and why a tire has a particular amount of traction on asphalt, but couldn't actually change a tire if their live depended on it.

The important point here is that even though ME and physics of the properties of physical things, there is still a distinction between the abstract, research oriented side, and the dirty, messy, practical side. This is a distinction which most of the computer "science" majors out there seem to pretend doesn't exist.

Most of true computer science doesn't even have anything to do with computers. Take Big O notation. In computer science, if an algorithm takes an hour, a day, or a mon, as long as they scale linearly as the size of the input goes up, they're all O(n). Try to argue to a customer that they should be considered equal in any way, though, is likely to make for a short career as a programmer.

The harsh reality is that most companies advertising for computer science majors don't really want computer science majors. Sure, they want someone with a good knowledge of algorithms, but - as Jeff pointed out - the ability to use version control is at least as important. Grungy skills, such as creating crash dumps that allow you to get good diagnostics info about customer problems without having to ship them custom builds, while utterly boring from a pure CS standpoint, are worth their weight in gold outside of academia.

This isn't to say that software engineers shouldn't have a grounding in CS theory. There's going to be a lot of overlap. The difference is one of focus. Once we accept that there are really two majors trying to fit into one curriculum in most schools, we can start the process of trying to make a one size fits all, and stop trying to turn out physics majors that we expect to be able to design a camshaft.

Sunday, January 6, 2008

Typing Puppet Strings Onto Your Servers

Just like a good Perl programmer, a system administrator should strive for a certain degree of laziness.

Now, this is not the kind of laziness that leads one to think "Eh, I'm not going to bother installing that update." No, this is the form of efficient laziness that says "I could download and install that update, but there's got to be a way to get it done automatically without wasting my time." These are the kind of people who have libraries of shell scripts and packed cron jobs.

Now, those libraries of shell scripts are great, but they can be an awful lot of work to write and maintain. Not very lazy at all! So, rather than going that route, I've been working with (and on) a package called Puppet.

Puppet is a client/server package written in Ruby. Essentially, you configure the server with the configuration settings you want all of your machines to look like. The clients get pointed at the server, pull all of their settings down, and make them happen.

It's got a decent library of native types (such as packages, files, users, etc) right out of the box. If you need something that's not covered, it's fairly straightforward to write your own custom code (assuming you know Ruby) that allows you to extend what kinds of files and setting Puppet is able to directly manage. Thanks to some good helper libraries, I was able to whip up a custom module that allows me to manage entries in /etc/sysctl.conf is only 59 lines of code!

Some of the cooler features of Puppet:

  • All communication is XML-RPC based, making it easier to write custom programs that communicate with Puppet
  • Collections of facts about client systems (OS, OS version, etc) are reported back to the server and can be stored in a database
  • Defines and Exec allow you to create complex configurations without writing any Ruby code
  • ERb templating system (the same one of Ruby on Rails fame) allows you to generate complex configuration files with per-host settings

Ask anyone who manages big numbers of systems - hundreds, or thousands - and they'll tell you that the ability to automatically manage systems from provision to decommission without manual intervention is absolutely essential. Whether it's built in, like GPO in Windows, or an add-on package like Puppet, trying to manage any more than one or two systems without this kind of help is just making more work for yourself.

And that's not very lazy at all.