Files validation in games

Started by
2 comments, last by cr88192 10 years, 11 months ago

Hey,

In my game I have a server which stores several mods to the original game. So a casual user can download and play. But before getting into mod I need to validate all its files and ensure that the user didn't corrupt them. Let's assume that I have to check 100 MB of files and that only 1 MB is corrupted. In result checking 99 MB would be a waste of time and the server would have to send only 1 MB of correct data that client would replace. And now I'm wondering if it's worth to use some complex algorithms like CRC-32 or MD5 to check that files are correct? What would be the best solution? I would be very grateful for help!

Advertisement

Hey,

In my game I have a server which stores several mods to the original game. So a casual user can download and play. But before getting into mod I need to validate all its files and ensure that the user didn't corrupt them. Let's assume that I have to check 100 MB of files and that only 1 MB is corrupted. In result checking 99 MB would be a waste of time and the server would have to send only 1 MB of correct data that client would replace. And now I'm wondering if it's worth to use some complex algorithms like CRC-32 or MD5 to check that files are correct? What would be the best solution? I would be very grateful for help!

CRC isn't that complex (many languages have built in functions for it in the standard library, for C++ you can use boost), the server could generate its checksums when it starts, you could also tag some files as non-critical and skip the checks on those. (for some games it doesn't matter if the client uses for example different textures than the server)

[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!
Build a list of all the files in the mod/download package/etc. along with their MD5 checksums. You can do this once on the server and cache the result in a "manifest" file. Have the client download this first. Don't forget to checksum the manifest!

Once you have that done, the client can use the manifest to know what data to download and to validate the download itself. Scanning and hashing a couple hundred MB of files shouldn't take long (a few seconds on modern hardware, mostly waiting for disk I/O) so you can simply run the scan every time the game starts up.

CRC is a pretty weak checksum algorithm, so prefer something like MD5 if you can, just for the sake of reducing the possibility of weird collisions.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

yes, agreed WRT manifests and checksums.

I had considered similar, sadly if/when I ever get around to generalized network file-copying.

also agreed regarding disk IO:

IME, this is often one of those things which eats a lot of time, but is often underestimated.

a recent example was an observation with a specialized compressed audio-codec of mine:

it was originally designed mostly to save RAM, and allow random-access piecewise decompression (basically, sort of like DXT but for audio, using independently-encoded fixed-size blocks).

however, as a side effect (observed during tests), the reduction in file-sizes resulted in a noticeable speedup vs loading larger WAV files.

while a person can complain some about the CPU cycles needed to decompress the audio, my tests showed this to be a pretty minor factor (overall added CPU cost is negligible). (of what time goes into audio-mixing, most of it goes into other things, like sample interpolation and reverb calculations...).

granted, there may still be an issue for disk-seeks and cluster-overhead for small files, but this can be addressed via bundling (say, rather than having a larger number of small files, we have a small number of larger files). similarly: the entire bundle can be loaded into RAM at once, and it is also possible to do combined checksum over the whole bundle.

there are various options for this, ranging from slightly more complex ZIP based packages, to simpler ones (such as the Quake PACK and WAD2 formats). this later case can probably be called "WAD variants" (mostly due to "generally similar file structure" to the original WAD).

there are various tradeoffs for why a person might pick one sort of packaging or another, but personally I prefer WAD-like formats for small/specialized data storage, and something like ZIP for "general" storage of lots of heterogeneous data. (not going to go into specifics too much here ATM).

things I have more often used WAD-variants for:

storing globs of compiled bytecode and metadata for my script-VM (produced by "compiling" script-code libraries);

storing voice-fragments for a text-to-speech engine (where there are large numbers of basically short audio-fragments used for unit-selection or similar);

storing samples for the various MIDI-instruments (for a wavetable MIDI synth);

...

basically: cases where otherwise a person will have a directory with large numbers (100s or more) of tiny (often under 1kB) files.

things I have more often used ZIP-based containers for:

collections of general data files;

collections of textures;

...

This topic is closed to new replies.

Advertisement