So, about six months ago I embarked on a domestic data infrastructure program to reduce my risk. This includes:
- Relocating my server system to the basement, where the temperature is probably more to its liking (and where additional noise was not going to bother me). This necessitated running lots of Cat 6 cable through the walls; I put a gigabit switch in the basement and ran cable runs to most of the rooms where data would be needed. (Lesson learned: no matter how many cable runs you think you need, run more. Pulling 2 wires is only marginally more expensive than pulling one...)
- Attaching a RAID array to the server system.
- Getting a hosting provider with reasonable storage limits where I can put some of my data so it is accessible from off my own private network.
- Migrating all critical data into SVN repositories.
For the RAID system, I put a 3Ware 9500S-4LP hardware RAID card in my Linux system (about $300). This has four SATA ports. The reason I went with hardware RAID instead of motherboard RAID (sometimes called "fake hardware raid") is that the hardware solution seemed to offer more in the way of hot migration and upgrades. Building my own RAID system turned out to be more of a hassle than expected, mostly because I ended up ordering parts from mutiple vendors because no one carried all the parts I needed. I bought the RAID card and the drives (three 500G drives) for a total of $850. I bought a four-bay enclosure from Addonics. I opted to spring for "multilane SATA", which allows multiple SATA drives to be connected over a single cable; this required adapters at both the enclosure side and the system side, since both the enclosure and the system just had four regular internal SATA connectors. (Running four cables from system to enclosure seemed like it was asking for trouble.) The trickiest part turned out to be getting the right SATA multilane cable; turns out there are two different types of SATA multilane connectors (screw type and latch type), and many enclosures and adapters are vague about which kind they need. So I ended up buying the wrong cable first, and then had to buy the right kind from sataparts.com. Once I got the RAID system physically put together, it was pretty easy. My Linux distro already had the right 3Ware driver installed, and the controller had a nice web interface that let me configure the volume set. With RAID 5, the three 500G drives show up as a 1TB SCSI disk, which I partitioned using LVM.
I could have bought a NAS box, but six months ago the choices were pretty weak. (I suspect this has gotten slightly better.) Would have been less hassle to put together, and maybe cheaper, but I'm sure there would have been compromises too. I'm pretty happy with the hardware RAID solution, and I've got a choice of upgrade paths. (I could throw another 500G in, and have it rebalance the data across four drives giving me 1.5TB, or when the cheap 1TB drives come out, I can pull one 500G out, let the array run in "degraded mode", throw two 1TBs in, create a "degraded" RAID set from them, move the data, then pull the 500G drives and put the third TB drive in giving me 2TB.)
My system is on my home network, which is connected to the internet using via a consumer-grade NAT firewall. So getting out is easy, but getting in is hard. I could have gone the dynamic DNS route, but I chose instead to get a hosting provider for files that I wanted access to from outside. I set up a hosting account at www.textdrive.com, which is great. They make it really easy to set up SVN, WebDAV, etc, so I set up two SVN repositories on my hosted system for files I need roving access to (such as presentation slides, in case I get to a conference and my laptop doesn't.) I set up two because SVN doesn't have good support for actually removing things from repositories, so they tend to grow over time. So there's a "permanent" and "transient" repository; the transient repository is for short-lived projects where after some point I won't need the history any more. SVN turns out to be a reasonably nice solution for accessing the same file from multiple systems, since I tend to either be at home and use my desktop system exclusively, or be on the road and use my laptop exclusively.
I decided to get all my data into SVN, after being inspired by this article from Jason Hunter. Even for data that you don't think is ever going to change, like photos (hey, what about photoshop?), SVN turns out to be a pretty good solution. If you get a new computer, you can just do one checkout and all your data is there. Keeping an up-to-date checkout on your home and laptop systems (in addition to the server) mitigates a number of data loss scenarios. I'm not there yet -- I'm still migrating, but I'm making progress.
The big question mark now is the backup strategy -- backing up a terabyte is pretty hard.