yes, agreed WRT manifests and checksums.
I had considered similar, sadly if/when I ever get around to generalized network file-copying.
also agreed regarding disk IO:
IME, this is often one of those things which eats a lot of time, but is often underestimated.
a recent example was an observation with a specialized compressed audio-codec of mine:
it was originally designed mostly to save RAM, and allow random-access piecewise decompression (basically, sort of like DXT but for audio, using independently-encoded fixed-size blocks).
however, as a side effect (observed during tests), the reduction in file-sizes resulted in a noticeable speedup vs loading larger WAV files.
while a person can complain some about the CPU cycles needed to decompress the audio, my tests showed this to be a pretty minor factor (overall added CPU cost is negligible). (of what time goes into audio-mixing, most of it goes into other things, like sample interpolation and reverb calculations...).
granted, there may still be an issue for disk-seeks and cluster-overhead for small files, but this can be addressed via bundling (say, rather than having a larger number of small files, we have a small number of larger files). similarly: the entire bundle can be loaded into RAM at once, and it is also possible to do combined checksum over the whole bundle.
there are various options for this, ranging from slightly more complex ZIP based packages, to simpler ones (such as the Quake PACK and WAD2 formats). this later case can probably be called "WAD variants" (mostly due to "generally similar file structure" to the original WAD).
there are various tradeoffs for why a person might pick one sort of packaging or another, but personally I prefer WAD-like formats for small/specialized data storage, and something like ZIP for "general" storage of lots of heterogeneous data. (not going to go into specifics too much here ATM).
things I have more often used WAD-variants for:
storing globs of compiled bytecode and metadata for my script-VM (produced by "compiling" script-code libraries);
storing voice-fragments for a text-to-speech engine (where there are large numbers of basically short audio-fragments used for unit-selection or similar);
storing samples for the various MIDI-instruments (for a wavetable MIDI synth);
basically: cases where otherwise a person will have a directory with large numbers (100s or more) of tiny (often under 1kB) files.
things I have more often used ZIP-based containers for:
collections of general data files;
collections of textures;