Friday, January 2, 2009

Journalspace Gets Creamed

If you can't be a good example, then you'll just have to serve as a horrible warning. — Catherine Aird

By now, I'm sure just about everyone will have heard about the disaster that has fallen upon the poor SOBs at journalspace.com. The short story is that the server that hosted all of the data for the blog site got hosed, and lost all of the data. Some of the high points:

  • The data were stored on a RAID1 array - a pair of mirrored drives
  • There were no bakcups, or any backup system in place at all
  • The drives did not fail, but were both completely overwritten on every block
  • No conclusive root cause was found, but a recently departed sysadmin had already been caught doing "a slash-and-burn" on other systems

So in the end, it looks extremely likely that an incompetent sysadmin set the system up with no meaningful backups, and then progressed to a malicious sysadmin by performing a thorough wipe of the only copy of the system data as he was shown out the door. What a wonderful cornucopia of lessons that can be gleaned from this one example! This is the kind of thing that you expect to see as a hypothetical scenario in security textbooks, not on the front page of Slashdot.

So let's take a quick rundown of lessons learned from our hapless friends.

Backups, backups, backups.
The lack of external backups is what catapulted this from an outage and a headache for the remaining sysadmins into a practically worst case scenario. In short, mirroring is not the same as backing up.
Trust, but verify.
Just because you implicitly trust your sysadmins (otherwise they can't do their jobs) doesn't mean you shouldn't keep an eye on them. Use sudo to log commands, monitor configurations via tools like RANCID, and Puppet or Bcfg2.
Watch the watchers.
Along the same lines, don't let one person exclusively handle any important project. One bad apple working in isolation will have a much, much easier time planting logic bombs than one who has one or two others working side by side.
Don't give them a chance to pull the trigger
Going to fire a sysadmin? Any hint of a possibility of a chance it might get ugly? Be prepared to make sure that any and all rights that admin has are completely gone by the time they know they're getting fired. And please note that most sysadmins will take sudden revocation of their rights as a hint they're getting fired, so the chat with HR should probably happen simultaneously with at least two other trusted admins pulling rights and locking accounts.
Cleanup after their messes.
Dislike a sysadmin enough to get rid of them? Then that same dislike and mistrust should extend to all of the work they've done for you. As soon as they're out the door, it's time to audit what they did. Make sure the work you didn't know they did is up to standards, and make sure to look for backdoors and time bombs.

It's too late for those poor souls at journalspace, but hopefully they'll at least serve to inspire others to fix something.

No comments: