Previous month:
October 2011
Next month:
December 2011

November 2011

"Alternative," really?

While we do our best to ignore it, our lives are ruled by databases. Sometimes in big ways (whether you can fly on an airplane, whether you can make a purchase) and sometimes in small, annoying, ways: I have been spending some quality time with Tagalicious to clean up and augment the tags in my iTunes music library (tip: don't buy the Mac App Store version of the application -- it doesn't do lyric lookup) and it has done a great job of making lots of small corrections to data that came from various other CD data databases.

Still, even with the Gracenote tag data, there is still an amazing amount of classification weirdness and plain stupid typographical mistakes. The most striking classification oddity is that for a period of about 20 years, "if it isn't American, it's Alternative" -- it seems like most British pop bands of the 80s and 90s just automatically become classified as Alternative in the world of Gracenote. Apple clearly doesn't use the same classification data in the iTunes Music Store, with its own odd clasification choices at times (Depeche Mode as a Rock band, really?), but clearly is in the Gracenote camp, classifying most of the popular British acts of the 80s as "Alternative Rock" specialized into "New Wave & Post-Punk." At this point, I'm not sure whether the problem is that the data is poor or the classification scheme is fundamentally broken. Either way, it makes the genre data useless as a measure of similarity (or anything, frankly).

One other interesting quirk of the Gradenote data is that it applies title capitalization rules strictly. Sometimes with questionable results...

Moral of the story: no matter how well curated a database is, crap data happens. Crap data can be crap data entry and it can be crap data design. Remember that as people and organizations expect you to put absolute faith in their data-driven systems.