Monday, December 10, 2007

A day and a half in the life...

This is a story about Windows, Samba, debugging, and frustration. I just spent a day and a half debugging a very annoying performance problem with a brand-new system.

Background: I just recently bought a new desktop computer, which I do about every two years. Computers should last longer than that, but I find that, even with good "hygeine", Windows systems tend to decay to the point where they exhibit weird behaviors after about two years, for which the cure is a complete reinstall of the OS and all applications. (Pause for Mac fanboys to snicker.) The "rebuild the world" process wouldn't be so bad if it weren't so hard to migrate all one's data -- even given the fact that a lot of my data is already in subversion. The real problem is that each application sprays its configuration and data randomly around your system, whether in the program install folder, registry, documents and settings folder, local settings folder, etc.

So, I bought a new computer, an extra-quiet one from www.endpcnoise.com. These guys specialize in quiet systems, and since I work in a home environment, the computer is usually the noisiest thing in the room. (Of particular annoyance is the variable speed fan on my existng Dell system, which, every time the system worked up a sweat, made a whiny noise. And we was fined $50 and had to pick up the garbage in the snow, but thats not what I came to tell you about.) Pretty happy with the new system overall.

So, I installed XP Pro on the new system, and proceeded to install all my applications, utilities, and all kinds of groovy things that we were talking about on the bench. And then I got to the part where I tried to use it; specifically, tried to fire up IDEA and build the project I'm working on.

Now, I've got a somewhat weird setup; when developing on a project, I checkout a workspace on my Linux server, which is served via SAMBA to my local network, and I run the IDE on my Windows desktop and point the IDE at my samba share. There's a measurable performance hit vs local, but I like being able to do some things from the Linux command line and other things from the IDE, so overall its a more productive setup for me.

When you ask the IDE to "make" the project, it crawls the files in the project checking their modification times. An "empty" make on a project with ~1000 files generally takes a few seconds to figure out that there's nothing to do. But when I set it up, the new (faster) system took about 30-60s to do an empty make.

OK, it's debugging time. What's different about the two systems? Same OS (XP Pro), same service pack level, both systems up to date on patches, same Java version, same IDE version, same user credentials, no host-specific information in my SAMBA config. Different hardware. Make sure my ethernet drivers are up-to-date. Test network for errors, swap cables, all that. Run IOMeter, found that both get similar throughput for large files on the same SAMBA share.

Crank up perfmon, which tells me that the new system is sending out more packets for a make than the old one. OK, crank up ethereal, get a packet capture, and find that the new system is sending/receiving 10x as many packets for the same operation:

[brian@brian-server ~]$ wc -l /tmp/*cap
258204 /tmp/new-cap
17719 /tmp/old-cap


So, what's the difference? Let's look at the packet capture. In the old trace, for each file being probed, it did something like this:

0.467895 192.168.1.104 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Basic Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
0.468041 192.168.1.107 -> 192.168.1.104 SMB Trans2 Response, QUERY_PATH_INFO
0.468283 192.168.1.104 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Network Open Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
0.468402 192.168.1.107 -> 192.168.1.104 SMB Trans2 Response, QUERY_PATH_INFO


Two requests, two responses per file. Seemed reasonable. On the new system, for each file:

2.010471 192.168.1.113 -> 192.168.1.107
SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree
2.010698 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2754
2.010900 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \
2.011011 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2755
2.011237 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work
2.011570 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: work
2.011752 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2755
2.011833 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.012025 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler
2.012157 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2756
2.012353 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes
2.012631 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: classes
2.012796 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2756
2.012897 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.013100 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler
2.013239 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2757
2.013445 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes\production\openjfx-compiler\com
2.013894 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: com
2.014095 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2757
2.014174 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.014355 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com
2.014504 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2758
2.014962 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun
2.015169 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: sun
2.015339 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2758
2.015428 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.015633 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun
2.015764 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x2759
2.015980 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx
2.016221 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: javafx
2.016402 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2759
2.016493 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.016693 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx
2.016827 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x275a
2.017096 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api
2.017348 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: api
2.017520 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x275a
2.017590 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.017803 192.168.1.113 -> 192.168.1.107 SMB NT Create AndX Request, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api
2.017919 192.168.1.107 -> 192.168.1.113 SMB NT Create AndX Response, FID: 0x275b
2.018133 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, FIND_FIRST2, Pattern: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree
2.018389 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, FIND_FIRST2, Files: tree
2.018547 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x275b
2.018626 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.018779 192.168.1.113 -> 192.168.1.107 SMB Close Request, FID: 0x2754
2.018851 192.168.1.107 -> 192.168.1.113 SMB Close Response
2.019157 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Basic Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
2.019292 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, QUERY_PATH_INFO
2.019495 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Standard Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
2.019613 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, QUERY_PATH_INFO
2.019832 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Internal Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
2.019960 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, QUERY_PATH_INFO
2.020206 192.168.1.113 -> 192.168.1.107 SMB Trans2 Request, QUERY_PATH_INFO, Query File Network Open Info, Path: \work\openjfx-compiler\classes\production\openjfx-compiler\com\sun\javafx\api\tree\OnDeleteElementTree.class
2.020316 192.168.1.107 -> 192.168.1.113 SMB Trans2 Response, QUERY_PATH_INFO


For those of you who don't enjoy reading packet dumps (I hope that's all of you), what's going on here is that it does some sort of complicated multipacket transaction for each level of directory from the project root down to the last directory in the chain, and then does four request-responses for each file. And it repeats the directory stuff for every file, even though it just asked that.

OK, more debugging -- what could cause a system to deviate from the standard file system client behavior? Check all the network control panel settings, they're all the same. Spend several hours googling through the MS knowledge base for file sharing related problems, look at the various registry keys and file versions mentioned, nope, none of them are helpful. Google for people who have had similar problems. Many have, but no one reported a solution that works, except one person, who mentioned that their network behavior changed when they changed versions of Symantec antivirus. Well, I don't run Symantec AV, but I do run ZoneAlarm. And I do have different versions -- ZA Antivirus on the old system, ZA Suite on the new. Seems like a small difference -- they're clearly built on the same base technology -- but lets try it. Disabled ZA, reboot, make sure its not running, and run my IDE again -- no change. Still an annoying 30-60s delay before it figures out there's nothing to rebuild.

At this point, I asked my friends for help. Lots of sympathy. Lots of "check this, check that", but very little advice that actually moved me towards a solution (sorry, guys).

That post about the guy with the Symantec problem gnawed at me, though. I know security programs intercept a lot of network traffic, so the theory was perfectly plausible, and the best theory I had so far. I did the "disable ZA" thing again, rebooted, and cranked up Rootkit Hook Analyzer to see if ZA still had anything hooked, and it did, even though there were no ZA processes running and the ZA TrueVector service was stopped. So, I ran the uninstaller for ZA Suite, rebooted, checked with RHA to see that everything was unhooked (it was), and ran my IDE test again -- and this time, sweet success!

So, the conclusion is that the ZA Suite interferes with file sharing client behavior in a rather fundamental way (but one which only has a noticeable affect when dealing with lots of small files).

So, my system is temporarily defenseless against malware while I decide what to do. Why on earth would ZA rewrite the file system client packet stream like that? I want to send them a bill for that day and a half.

Tuesday, June 19, 2007

Remove checked exceptions?

Recently, Neal Gafter mused about whether we should consider removing checked exceptions from Java.  The motivation from this was not what you might expect, but rather an observation that checked exceptions interacts heavily with a lot of other language features, and that evolving the language might be easier if we were willing to consider removing some features.  (Neal knows this won't ever happen, he's just trying to get us thinking about Life After Java.)  Not surprisingly, it generated a storm of comments, ranging from "hell yeah!" to "hell no!".

This isn't a new topic; it comes around every few years.  A few years back I wrote about the debate surrounding checked exceptions, and the debate continues to rage.  My problem is that I think most of the vocal opponents of checked exceptions are objecting for the wrong reasons (back then, I wrote: "My opinion is that, while properly using exceptions certainly has its challenges and that bad examples of exception usage abound, most of the people who agree [ that checked exceptions are a bad idea ] are doing so for the wrong reason, in the same way that a politician who ran on a platform of universal subsidized access to chocolate would get a lot of votes from 10-year-olds").

Reading through the against-checked-exceptions commenters on Neal's blog, we can divide them into three primary groups:

  1. "I don't like checked exceptions because they're too much work." 

  2. "Checked exceptions were a nice idea in theory, but using them correctly makes your code really ugly, and I'm left with a choice of ugly code or wrong code, and that seems a bad choice." 

  3. "Checked exceptions are a good idea, but the world isn't ready for them."  (Frequent refrain from this group: "Man, have you looked at some of the code out there?")


To the people in camp (1), I say: engineering is hard -- get over it.  Error handling is one of the hardest things to get right, and one of the easiest things to be lazy about.  If you're writing code that's supposed to work more than "most of the time", you're supposed to be spending time thinking about error handling.  And, pay for your own damn chocolate. 

To the people in camp (2), I have more sympathy.  Exceptions do make your code ugly, and proper exception handling can make your code really ugly.  This is a shame, because exceptions were intended to reduce the amount of error-handling code that developers have to write.  (Ever try to properly close a JDBC Connection, Statement, and ResultSet?  It requires three finally blocks.  Ugly if you do it right.  But, almost no one ever does it right.  (The real culprit here is that close() throws an exception -- what are you supposed to do with that exception?  But that's fish under the bridge.)) 

But perhaps there's a way to not throw the baby out with the bathwater, by providing better exception handling mechanisms that are less ugly.  Dependency injection frameworks did a lot of that for us already, for a large class of applications -- and the code got a lot prettier, easier to write, and easier to read.  AFAICS, the two biggest removable uglinesses of exceptions are repeated identical catch clauses and exception chaining. 

The repeated catch clause problem is when you call a method that might throw exceptions A, B, C, and D, which do not have a common parent other than Exception, but you handle them all the same way.  (Reflection is a major offender here.) 

public void addInstance(String className) {
    try {
        Class clazz = Class.forName(className);
        objectSet.add(clazz.newInstance());
    }
   catch (IllegalAccessException e) {
        logger.log("Exception in addInstance", e);
    }
    catch (InstantiationException e) {
        logger.log("Exception in addInstance", e);
    }
    catch (ClassNotFoundException e) {
        logger.log("Exception in addInstance", e);
    }
}

You'd like to fold the catch clauses together, because duplicated code is bad.  Some people simply catch Exception, but this has a different meaning -- because RuntimeException extends Exception, you're also sweeping up unchecked exceptions accidentally.  You can explicitly catch and rethrow RuntimeException before catching Exception -- but its easy to forget to do that.

public void addInstance(String className) {
    try {
        Class clazz = Class.forName(className);
        objectSet.add(clazz.newInstance());
    }
    catch (RuntimeException e) {
        throw e;
    }
    catch (Exception e) {
        logger.log("Exception in newInstance", e);
    }
}

My proposal for this problem is to allow disjunctive type bounds on catch clauses:

public void addInstance(String className) {
    try {
        Class clazz = Class.forName(className);
        objectSet.add(clazz.newInstance());
    }
   catch (IllegalAccessException | InstantiationException | ClassNotFoundException  e) {
        logger.log("Exception in addInstance", e);
    }
}

My compiler friends tell me that this isn't too hard. 

The other big ugliness with exceptions is wrapping and rethrowing:

public void findFoo(String className) throws NoSuchFooException {
    try {
        lookupFooInDatabase(name);
    }
   catch (SQLException e) {
        throw new NoSuchFooException("Cannot find foo " + name, e);
    }
}

Now, the wrap-and-rethrow technique is very effective -- it allows methods to throw exceptions that are at an abstraction level commensurate with what the method is supposed to do, not how it is implemented, and it allows you to reimplement without destabilizing method signatures.  But it adds a lot of bulk to the code.  Since this is such a common pattern, couldn't it be solved with some sort of declarative "rethrows" clause:

public void findFoo(String className) throws NoSuchFooException
rethrows SQLException as NoSuchFooException {
    lookupFooInDatabase(name);
}

The rethrows clause is part of the implementation, not the signature, so maybe it goes somewhere else, but the idea is clear: if someone tries to throw an X out of this, wrap it with a Y and rethrow it. 

An alternate approach to this would be possible with closures and reified generics; it would be possible to write a pseudo-control construct that said "execute this closure but if it throws X, wrap it with a Y and rethrow it."  Unfortunately, with the current state of generics, we can't write such a generic method, we'd have to write a separate one for each exception type we want to wrap. 

These approaches focus on the symptom -- because the arguments in group (2) are about symptoms.  If we could alleviate the symptoms, people might grumble less.

The people in camp (3) are saying something slightly different.  I don't really have an answer for them, because what they seem to be saying is that no matter what mechanism you give people for dealing with failure, they won't follow it.  Checked exceptions were a reaction, in part, to the fact that it was too easy to ignore an error return code in C, so the language made it harder to ignore.  This works on a lot of programmers who are slightly lazy but know that ignoring exceptions is unacceptable, but apparently is worse than nothing for some parts of the population.  (We'd like to take away their coding rights, but we can't.) 

 Checked exceptions are a pain, and in some frameworks (like EJB before dependency injection), can be really painful.  Once the ratio of "real code" to "error handling code" rises above some threshold, readability suffers greatly, and readability is a fundamental value in the Java language design.  Even if the IDE generates the boilerplate for you, you still have to look at it, and there's a lot of noise. 

On the other hand, my experiences using third party C++ libraries was even more painful than anything Java exceptions have ever subjected me to.  Virtually no packages ever documented what might be thrown, so you end up playing "whack a mole" when exceptions did pop up -- and usually at your customers's site.  If people are not forced to document what errors their code throws, they won't -- especially the people that the people in camp (3) are afraid of.  As long as those folks are allowed to code, the value we get from checked exceptions forcing developers to document failure modes overwhelms the annoyances.

But, as I said above, I think many of the annoyances can be removed by adding a small number of exception streamlining constructs.  This doesn't help Neal with simplifying closures, but it does help us get our job done with a little less pain. 

Finally, a meta-note -- its really easy to misinterpret the volume of support for "removing checked exceptions" as any sort of gauge for community consensus.  We're intimately familiar with the pain that checked exceptions cause; we're substantially less familiar with the pain that they free us from.  (Obviously, neither approach is perfect, otherwise there'd be no debate.)

Monday, June 18, 2007

Living in the information age

While reviewing my household budget recently, I realized that we had truly crossed into the information age -- we pay more for bits than we do for energy.  (By bits, I mean both the infrastructure by which information is delivered to us in electronic form, and the content we purchase; by energy, I'm including only my home utility bills, not gasoline, but since I work at home, I'm guessing my gasoline consumption is lower than average.) 

  • Home telephone (basic line + unlimited long distance): $45

  • Cell phone (mine, including business use): $60

  • Cell phone (rest of family, 4-line family plan): $110

  • NetFlix: $20

  • DirecTV (including TiVo data fee): $75

  • Rhapsody To Go (music subscription service): $15

  • DSL: $30

  • T-Mobile WiFi access plan (reasonable coverage at cafes, hotels, airports): $30


Total: $385/month for bits. 

As to fossil fuels, our combined electric and gas bill average out to around 260/mo.

Sunday, June 17, 2007

Tivo + HD -- no good choices

I've been a Tivo addict since the first DirecTivo boxes came out.  After getting a big screen TV, the standard definition picture looks pretty bad (especially as it seems that DirecTV compresses the hell out of their signals to make room for more pay-per-view channels.)  So we wanted to upgrade to some sort of HD service, but of course Tivo is a must (no third-party DVRs -- no one who has had both a third-party DVR and Tivo has ever said anything good about the third-party DVRs.) 

Option 1: DirecTV's HD Tivo.  This was released a few years back, but is going to be incompatible with the HD locals that DTV is rolling out, which will be using a different encoding.  (Some people have combined this solution with OTA HD, but I have no interest in playing games with antennas.)  And DTV has yet to roll out HD locals in this area anyway, and there's little sign they're coming soon.  So even if there was HD local channels here, there's no Tivo solution that can record them, only the DirecTV DVR.

Option 2: Digital cable + Series3 Tivo.  This seems like the obvious choice, except for the high cost of the Tivo box (600+, plus the increased cost of the Tivo data service which I'd been insulated from since DTV customers were grandfathered in at the low rates).  But...no TivoToGo or MRV on the Series3 yet, which means you can't transfer videos to the video iPod.  This has to do with content restrictions surrounding their CableCard certification, but annoying it applies not only to protected HD but also to unprotected SD content.  Ugh.  (As to MRV, if you have two Series3 Tivos with CableCards, why is there a problem moving the content from one Tivo to another?) 

Option 3: Comcast DVR with Tivo.  Comcast did a deal with Tivo where you can get their Motorola DVR and upgrade to Tivo software as it is rolled out.  But its going to be a long time before the rollout reaches here. 

MythTV is not an option; you can't put a CableCard in a MythTV box. 

So far, no hacks have appeared for the Series3 (other than those that involve reprogramming the PROM and resoldering it) that get around these restrictions.

We're going to bite the bullet and go with Option 2, and hope that eventually Tivo resolves its dispute with CableLabs and reinstates some form of TivoToGo and/or MRV.

Monday, June 4, 2007

Flying has its upsides too

I was in New York yesterday for my great-aunt's 90th birthday party.  The party was over at six, and our flight wasn't until 11, but we figured that New York was a pretty good place to pass a few hours, so we weren't worried.  But as the party was ending, it started to rain pretty hard, so we decided to just pack it in and head to the airport, even though JFK isn't the most fun place to kill a few hours.

Wandering around, I passed a half-asleep fellow camped out behind his laptop and thought "That looks an awful lot like Doug Lea".  And it was -- he was on a six hour layover on the way home from Amsterdam (yuck).  Usually the surprises you get at the airport are of the unpleasant variety, but not always.

Monday, May 28, 2007

Data packrat

I'm finally moving forward with my plan to get all my life data into digital form.  The infrastructure is there (see earlier posts), with lots of redundant disk space and subversion repositories.  Now I just have to clean out the file cabinets.

All the various entities that send me monthly statements (banks, brokerages, utilities) are trying to get me onto some sort of "paperless" plan (and, IMO, going about it in a pretty stupid way) by offering some sort of online statements.  That's great going forward, but what about existing statements?  (And, how long should I really hold on to these records?  Can I throw away those tax returns from 1987?  Won't they be useful to my biographer?)  They all seem to offer some set of past statements in downloadable form; some going back only a few months, some going back seven years. 

Most will only let you see the older statements if you agree to let them stop mailing you statements.  (You can rescind that agreement at any time, so you know what I did.  I don't really _want_ the paper statements if I can have good PDFs, but e-mail is so unreliable that I hesitate to let them use e-mail to send me important notices that might be indicators of identity theft or other bad things.)  So I downloaded all the statements I could find, scanned some of the others I thought were worth having, and relegated the paper copies to a box in the basement that, if it got destroyed, I wouldn't be upset. 

Not one of the dozen banks, utilities, or brokerages has the statement download thing right.  None of them have a "download all my statements" feature, which make downloading seven years worth pretty annoying.  (And all are implemented in ways that prevent you from shortcutting around their bad UIs or scripting it yourself.)  None offers any sort of scriptable interface for downloading statements, so if you want to continue to gather statements, you have to visit twelve web sites.  (I'd like to have the PDFs delivered right into my Quicken; they've been talking about electronic bill presentment for years but I don't see it here yet.)  Some make it easier by offering an option to e-mail you the PDF monthly in addition to the physical delivery; some only offer that as an alternative to the physical delivery.  Some (Wells Fargo) won't even let you download any e-statements unless you consent to online-only delivery (and the online statements don't have the check images that the physical statements do.)  Guess I'll be "consenting" for them five minutes a year to get the past year's statements, yuck. 

Bulk scanning turns out to be not so easy with cheap consumer grade scanners.  I bought a Visioneer RoadWarrior for receipts and such, but use the scanning features of my HP LaserJet 3050 for bulk scanning because it has a document feeder.  But its still pretty slow, and the software sucks.  (I'm surprised that the throughput with the RoadWarrior is bound not by the physical scanning speed, but the software that turns it into the appropriate file format and drops it into a drop folder.)  So I ended up not scanning everything I thought I would, at least not in the first round.  Slowly migrating...

Subterranean data center, part II

Moving the server and all the network hardware to the basement was great -- it got it out of my closet -- but that presented a problem for the wireless, because the wireless router wasn't strong enough to get a signal up to the back bedroom on the second floor, where we have a squeezebox and need a steady supply of bits.  I have a couple of the cheap Linksys routers (I am always surprised at how useful it turns out to have various extra computer parts lying around.)  Unfortunately, the Linksys firmware doesn't do what I wanted -- which was to have one box do the gateway router stuff (NAT, DHCP, etc) and another act as a wireless access point.  They want you to buy the more expensive Access Point version of the box, which is identical except for the firmware.

So, I installed the DD-WRT firmware on one of my Linksys routers, which lets it act as an access point -- among many other things.  DD-WRT is a linux-based distribution for cheap hardware routers, which includes all sorts of networing software not supported by the out-of-the-box firmware (e.g., access point and access point client modes, ipv6, VPN, WPA (client and server), port forwarding, QoS management, SNMP, DMZ, etc.) 

As often happens, the road was bumpy but in the end everything worked fine.  There are half a dozen different versions of the popular WRTG54, so the instructions might not fit your version exactly.  (I learned this on the part where it says "pull firmly to remove the bezel", and my version had screws holding the circuit board to the bezel...and pulling firmly ripped them out.)  Despite following all the directions carefully, the first flashing attempt failed, and I had "bricked" my router.  I followed the various "debricking" instructions, and eventually had to resort to the most extreme, where you have to short a few pins on the flash chip to restore it to its default state...and eventually I got WRT downloaded into the box.  From there, it was smooth sailing, the web-based admin GUI was easy.

Once I had DD-WRT running, it was easy to configure it as an access point -- and if I need better coverage, can just add more.  Also, a cheap router + DD-WRT is the cheapest way to put a wired ethernet device onto a wireless network; run it in "access point client" mode.  Much cheaper than buying a device designed for this purpose...

Subterranean data center

I've been getting paranoid lately about data loss.  This was almost certainly prompted by a disk failure at my old business that caused some actual data loss.  As seems to happen a lot, the disk failure also disclosed a failure in our otherwise sensible-seeming backup program, with the result being that I lost several months of archived e-mail, among other things.  Disk failure rates seem to be on the rise; the combination of rising areal densities and the public's clear choice of "cheap" over "reliable" virtually guarantees it.  (See, for example, http://australianit.news.com.au/story/0,24897,21553519-15321,00.html.)  And with larger capacities, the negative consequences of a disk failure is that much greater.

So, about six months ago I embarked on a domestic data infrastructure program to reduce my risk.  This includes:

  • Relocating my server system to the basement, where the temperature is probably more to its liking (and where additional noise was not going to bother me).  This necessitated running lots of Cat 6 cable through the walls; I put a gigabit switch in the basement and ran cable runs to most of the rooms where data would be needed.  (Lesson learned: no matter how many cable runs you think you need, run more.  Pulling 2 wires is only marginally more expensive than pulling one...) 

  • Attaching a RAID array to the server system.

  • Getting a hosting provider with reasonable storage limits where I can put some of my data so it is accessible from off my own private network.

  • Migrating all critical data into SVN repositories.


For the RAID system, I put a 3Ware 9500S-4LP hardware RAID card in my Linux system (about $300). This has four SATA ports.  The reason I went with hardware RAID instead of motherboard RAID (sometimes called "fake hardware raid") is that the hardware solution seemed to offer more in the way of hot migration and upgrades.  Building my own RAID system turned out to be more of a hassle than expected, mostly because I ended up ordering parts from mutiple vendors because no one carried all the parts I needed.  I bought the RAID card and the drives (three 500G drives) for a total of $850.  I bought a four-bay enclosure from Addonics.  I opted to spring for "multilane SATA", which allows multiple SATA drives to be connected over a single cable; this required adapters at both the enclosure side and the system side, since both the enclosure and the system just had four regular internal SATA connectors.  (Running four cables from system to enclosure seemed like it was asking for trouble.)  The trickiest part turned out to be getting the right SATA multilane cable; turns out there are two different types of SATA multilane connectors (screw type and latch type), and many enclosures and adapters are vague about which kind they need.  So I ended up buying the wrong cable first, and then had to buy the right kind from sataparts.com.  Once I got the RAID system physically put together, it was pretty easy.  My Linux distro already had the right 3Ware driver installed, and the controller had a nice web interface that let me configure the volume set.  With RAID 5, the three 500G drives show up as a 1TB SCSI disk, which I partitioned using LVM. 

I could have bought a NAS box, but six months ago the choices were pretty weak. (I suspect this has gotten slightly better.)  Would have been less hassle to put together, and maybe cheaper, but I'm sure there would have been compromises too.  I'm pretty happy with the hardware RAID solution, and I've got a choice of upgrade paths.  (I could throw another 500G in, and have it rebalance the data across four drives giving me 1.5TB, or when the cheap 1TB drives come out, I can pull one 500G out, let the array run in "degraded mode", throw two 1TBs in, create a "degraded" RAID set from them, move the data, then pull the 500G drives and put the third TB drive in giving me 2TB.) 

My system is on my home network, which is connected to the internet using via a consumer-grade NAT firewall.  So getting out is easy, but getting in is hard.  I could have gone the dynamic DNS route, but I chose instead to get a hosting provider for files that I wanted access to from outside.  I set up a hosting account at www.textdrive.com, which is great.  They make it really easy to set up SVN, WebDAV, etc, so I set up two SVN repositories on my hosted system for files I need roving access to (such as presentation slides, in case I get to a conference and my laptop doesn't.)  I set up two because SVN doesn't have good support for actually removing things from repositories, so they tend to grow over time.  So there's a "permanent" and "transient" repository; the transient repository is for short-lived projects where after some point I won't need the history any more.  SVN turns out to be a reasonably nice solution for accessing the same file from multiple systems, since I tend to either be at home and use my desktop system exclusively, or be on the road and use my laptop exclusively. 

I decided to get all my data into SVN, after being inspired by this article from Jason Hunter.  Even for data that you don't think is ever going to change, like photos (hey, what about photoshop?), SVN turns out to be a pretty good solution.  If you get a new computer, you can just do one checkout and all your data is there.  Keeping an up-to-date checkout on your home and laptop systems (in addition to the server) mitigates a number of data loss scenarios.  I'm not there yet -- I'm still migrating, but I'm making progress. 

The big question mark now is the backup strategy -- backing up a terabyte is pretty hard.

Thursday, May 24, 2007

Squeezebox update: iTunes support redux

Well, the vbrfix program didn't work quite as advertised, and borked the mp3s I ran it on.  Fortunately, they had all been converted from FLAC, so restoring was simply a matter of re-running the flac-to-mp3 script (which does take a while to chew on 300G of music.)  But the "Fix MP3 Header" option in "foobar2000" does the trick, and now iTunes is happy with my MP3s.  But since fb2k is a Windows app, it means that the incremental conversion process when new FLAC files are added has a manual step, rather than one I can script.

Thursday, May 17, 2007

JCiP best seller at JavaOne 2007

For the second year in a row, Java Concurrency in Practice was the best selling book at the JavaOne bookstore...thanks everyone!

Monday, April 23, 2007

At JAX in Wiesbaden this week

I'm at the JAX conference, in Wiesbaden, Germany this week (www.jax.de.) I'll be speaking on Wednesday evening on (what else) concurrency and performance.

Sunday, April 15, 2007

Squeezebox update: iTunes support

There are lots of tools for dealing with audio files of various formats (MP3, FLAC, AAC, Ogg, WMA, etc), but most of them fall down when it comes to letting you deal with an entire library of music files.  (I suppose some of them must be good, but there are so many, you get tired of trying them.) 

I've mostly fallen back on scripting-based approaches.  When I rip using EAC (which rips to WAV, which has no tagging features), I instruct EAC to create a file that has all the relevant tags (title, artist, genre, etc) embedded in the file name, and I've got a script that feeds these into the flac encoder, creates files using a directory hierachy of the form /artist/album/track.flac, and sets the appropriate flac tags. 

I downloaded another script (http://robinbowes.com/projects/flac2mp3) that takes this directory and creates a parallel directory of MP3 files, transcoded from the flac.  It uses LAME to do the encoding, using the "--alt-preset standard" settings, and I point iTunes at that directory.  So far, so good. 

Well, iTunes seems confused by VBR (variable bit rate) MP3s that it didn't encode itself. The symptom is that iTunes thinks the tracks are way longer than they are.   

After much searching, the best alternative I came up with was to use a program called 'vbrfix' (http://www.willwap.co.uk/Programs/vbrfix.php), which rewrites the mp3 headers in a way that iTunes is happy with.  (It claims to fix other problems too.  The documentation sucks, and I had to download the source and build it (with no make file, just guessed, but it wasn't hard), and it does appear to render the MP3s compatible with iTunes.

Sunday, January 28, 2007

Interview on Software Engineering Radio

Tune in to Software Engineering Radio to hear the interview Marcus Voelter did with David Holmes and I at OOPSLA 2006.

Tuesday, January 16, 2007

JCiP named Jolt Award Finalist

2006 Winners and Finalists

 JCiP was named a finalist in the 2006 Jolt Awards in the technical books category.

Friday, January 12, 2007

Mama's got a squeezebox...

A number of friends have asked about my Squeezebox home audio setup, so rather than repeat myself, I'll write it up here. 

Squeezebox is a device that you attach to your home network and your stereo, streaming digital music to your stereo system.  There are lots of such devices on the market; Roku's Soundbridge, Apple's AirTunes, and offerings from networking companies like Netgear and Linksys.  I chose Squeezebox because it has a digital (optical) audio out, meaning that what gets piped into my expensive stereo is not going through the analog stage of a $20 sound card.  The model I have has a wireless (802.11g) interface and a wired ethernet port as well; I paid about $250 each, and we have them in multiple rooms so you can listen to any music from any room without fussing with physical CDs.  (You can even sync them so you can have the same music throughout the house, say for parties.) 

My primary motivation for this transition was that I hate CD furniture; it's mostly ugly, and takes up an obscene amount of room in your living room if you have any reasonable size music collection.  Even transferring from jewel cases to sleeves, which gives about a 2.5:1 space compression, CDs can overwhelm your living room. 

The first challenge: ripping the CDs.  This is the most time consuming step, so I did not want to have to do this again because I had chosen the wrong audio format.  There are three components to ripping: the audio extraction, the association of metadata (artist, title, genre) with albums and tracks, and conversion to the format of choice (mp3, wma, etc.)  As with many other situations, you can get all-in-one solutions like iTunes or Windows Media Player that will do all of these in a single step, or you can have more control over each step but have to deal with multiple programs. 

It turns out that nearly all rippers do not take advantage of the error-correcting information present in CDs.  So if you have a scratched CD, you'll get bad bits when you rip, and those bad bits will stay with your recording forever.  The only Windows-based tool I know of that will use the error correction is EAC (Exact Audio Copy).  It's not as slick as iTunes, but it will usually get you a perfect rip.  For CDs in good condition, it can usually rip at 10x, ripping a whole CD in 6 minutes or so.  If it detects bit errors, it will slow down and keep reading until it is satisfied; for one really badly scratched CD (one that wouldn't even play in my car), it chewed for 24 hours, and got all but ~1000 bits off! 

EAC will rip to WAV, and has a mechanism to post-process to further launch an external converter (mp3, wma, etc), which I did not use.  Instead, I saved the WAV files to disk and post-processed them separately.  iTunes and WMP can access the commercial CDDB database that associates metadata (artist, album, track names) with albums and tracks; EAC uses the open-source freedb database, which is convenient but whose data quality is less than perfect.  Expect to spend some time correcting titles and genres that don't match up (e.g., the first CD of a set is called Volume One, where the second is called Disc 2, or one volume lists Rock as the genre, where the other lists Pop).  You can do this through EAC or using an ID3 tag editor, but in any case, expect to spend some time cleaning up the data. 

For my storage format, I chose FLAC, the open-source lossless audio compression, which stores files in about 55% of the space of the WAV file.  This about about three times bigger than a good VBR MP3 or AAC, but disk space is cheap -- real cheap.  (As of this writing, 500G drives are going for less than $200.)  And the time to re-rip is very expensive.  I set up the ripper on Windows to write the output files to a drop folder on my Linux server (named using a convention that embeds the track, artist, album, and genre, since WAV doesn't support metadata tags), and have a home-grown perl-script (willing to share, just ask) that will find the files and feed them to the flac converter. 

Squeezebox versions 2 and later support FLAC native, so it doesn't have to transcode to MP3 on the fly.  This is nice because the transcoding interfers with fast forward / rewind functionality on the Squeezebox.  So, following the chain, error-free RIP courtesy of EAC, lossless conversion to FLAC, digital transfer from server to squeezebox, lossless FLAC decompression to PCM on squeezebox, digital out to receiver -- meaning no end-to-end signal loss, and digital-to-analog conversion done by my receiver.  Just as if I'd plugged the CD player's optical out into the receiver. 

For the server software, the free SlimServer package is written in Perl so can run on Windows, Linux, or Mac.  I chose Linux since I did not want to downgrade the reliability of my stereo to that of my Windows desktop.  (I have a Linux server in the house anyway, but if you don't, you can build one fairly cheaply.) 

If you want to transfer to your iPod or other device, you need to transcode from FLAC to MP3 or AAC or WMA or whatever your favorite portable format is.  The best MP3 encoder is called LAME (open source); you then have to decompress from FLAC to WAV, and pipe that into LAME to get an MP3 out.  (I believe iTunes for Mac has a LAME plugin, but not iTunes for Windows.)  LAME encoding using VBR (variable bit rate) takes a while.  Disk space is cheap enough you might consider an automated nightly script to encode all new FLAC files into a parallel tree of MP3 files for transfer to iPod, if iPod is a big enough part of your life.

Once you get all the ripping done, it's pretty nice.  It took me about a week to rip ~400 CDs "in the background" while I was working.  Thereafter, the only time you need to find the physical CDs is if you want to play them in the car.  And the SlimServer software has a web interface that lets you create playlists and such, so you can set up playlists for parties so you don't have to be fussing with CDs. 

Highly recommended.  We've got two squeezeboxes now (living room and bedroom) and are considering adding more (kids room, family room).  Plus there's a software player you can use on the computer.