Finally getting back to finishing this. Except for some of the classical albums (which I want to go back to and fiddle with to make the tags more sensible) everything's in FLAC format now. I currently have a batch script running that is doing a mass conversion of everything (using multiple threads) from FLAC to OGG, for loading onto my Clip+.
Gathering up the stuff that wasn't on the archive DVDs proved more challenging than expected. At some point, I stopped religiously archiving raw rips to optical media, so a lot of the original WAVs/FLACs were scattered across multiple old backup drives from my past 3 desktop PCs. Tracking all of those down was a major PITA; I'm probably still missing a few, but at least I have the original CD media... somewhere. In the crawlspace. *shudder*
FLAC encodes from Bandcamp seem to have occasional issues which cause the replaygain tagger in Linux (metaflac) to choke. Not sure if this is a Bandcamp issue or a metaflac issue.
I'd forgotten what a PITA Rhapsody was (I bought MP3 downloads from them before I switched to Amazon and Bandcamp, sometimes they were cheaper than Amazon). Half of their album downloads would have corrupted files; usually this would correct itself if you downloaded again, occasionally it would not, or different files would be corrupted. Sorting through the mess of archived Rhapsody downloads to find the good versions of everything was pretty annoying. RIP and good riddance to Rhapsody.
I debated what to do with stuff which was originally purchased as downloads in MP3 format. Converting to FLAC has no sonic benefit, and actually increases the amount of storage used. However, I find that I frequently go back and edit the start/end of tracks I've purchased in MP3 format (especially live albums, or stuff where tracks flow one into another), because MP3 encoding tends to insert small gaps (and occasionally, even glitches) at the start and/or end of tracks. Anything that is going to be edited and re-saved should be in a lossless format to prevent further degradation. So for that reason (and because I'm OCD) I transcoded the MP3s to FLAC. The directory names of the transcodes have a suffix to indicate that they came from a lossy source, as a reminder to myself that I don't have a lossless source for that album.
And then there are the cases where I have both the lossy download and the physical CD, but the lossy download has extra bonus tracks.
Vinyl rips also get tagged with a suffix on the directory name to indicate the source. These were a PITA too, because there were multiple archived versions of many of the vinyl rips on the optical media, representing different iterations of cleaning up clicks, pops, and other sonic defects. Had to sort out the duplicates.
Fun times. But I'm in the home stretch now!
Edit: The OGG transcode job is on the Bs now... B is for Buckethead (all threads chewing on Buckethead albums ATM)!
I set it up to use 6 streams (i.e. it processes 6 albums in parallel, each in a separate background job). It's keeping my FX-8350 pretty busy (load average hovering around 8, so all cores are loaded but nobody is starved for cycles), which is just about right to get it done as fast as possible without making the system noticeably laggy.