I wonder whether or not we could come up with a way to do a lighthearted test for ALAC corruption, since there's no native support for the feature. Once the files are hashed it's easy to detect corruption, but what about before that? I'm guessing not without some harder work put into analyzing the raw PCM stream for artifacts, which perhaps would be beyond the scope of this.
I guess a step 1 for implementing this would be to add preliminary support to just use FLAC's native checksum while work is being done on implementing the hash of the audio stream for formats like ALAC. This way some people can start testing it sooner.
Yeah, exactly, we just care on whether or not the checksum changes, which it never should. Is extracting the checksum out of a FLAC file trivial/supported?
I think it could be a very nice addition for bliss to be able to verify library integrity somehow.
I believe that regarding the method for creating a checksum for integrity checks we could go with something out of the usual. It's rather obvious that we don't need this to be cryptographically sound, so algorithms such as MD5 become wasteful. XXHash may be a nice solution, the hashing speed is ridiculously fast, and it should be reliable enough to fit our needs. Although I'm not sure yet, I think the speed difference between MD5 and XXHash won't matter much, the disk speeds will be the bottleneck, although we'll have to wait and see.
Another interesting factor here is how can we store this. I see two ways we can go about this:
1. We checksum the entire file, which means we can't store the sum as a metadata tag, rather we will have to do something like create a large file with song->hash association list.
2. We checksum only the audio data, and here there's the question if we want to sum the formatted data (FLAC, ALAC) or sum over the PCM stream. Not sure why would we do this over the PCM stream, but just putting it on the table.
I for one like solution 2 more. It makes it more portable to just have a builtin sum and it's simpler as well. Although I'm not sure if not summing the metadata could have a negative impact, I think not since bliss will already catch any metadata corruption, I.e. If my artist field gets scrambled or become gibberish. Also, proportionally it is _much_ more likely that a fail would happen in the audio stream as opposed to the metadata.
I don't think the average user would care much for this, but I do feel like an easy way to guarantee integrity throughout your library would be a game changer, current solutions are personal and flimsy. Specially on bliss where it could just be a "Generate sums" button or something like that and a "Check sums" or whatever. I think if this is done right a large share of the Audiophile community could start using it.