single data file loading speed

Started by
4 comments, last by Antheus 15 years, 10 months ago
Correct me if I'm wrong, but if I create an 'archive' with all my game data stored in a single file, this will be much faster to read than opening 200 seperate files? The amount of data would be the same. I am looking for a way to speed up load times but I don't want to work on something that will have little to no effect.
scottrick49
Advertisement
Generally yes.

You are avoiding a lot of unnecessary disk seeks and reads for opening every file.

If your data is mostly sequential when data is loaded, you get cache and locality benefits since disk reads take place as large blocks and the OS may cache the file for you.
Disk access is slow on physical drives because it involves mechanical motion. Files are generally stored in random locations on the disk so the disk access time of one large contiguous file will be less than the cumulative loading time of several constituent files located at disparate locations. However, the delay for loading that one file will be greater than the delay of loading any one of the small files due to its larger size. Also, depending on the size of the archive, if you can't use all the data at once it might just end up being swapped back into virtual memory..meaning it goes right back onto the disk. The long and short of it is that in general your assertion is probably correct but there are other variables to consider.
Depends on a lot of factors. But we can do some math to get an approximation.

Any time you're doing file I/O, it's probably the harddrive that's your bottleneck.

A common 7200RPM harddrive may have a seek time of, say, 10ms. That is, every time it has to jump to a completely new position on the disk, this is how long it'll take.

It also has a sustained transfer rate of, say, 70MB/s. So as long as the data you're reading is contiguous, it'll be able to get a read speed around that figure.

Now, if you were to roll all your data up into a single file, let's assume for the sake of simplicity that it wouldn't be fragmented at all. And when you read from it, you do so sequentially, starting from the beginning and reading towards the end, never jumping or skipping data.

The time it'd take to read the entire file in that case.... depends on the size of the file of course, which you didn't tell us. If you have 70MB data to load, it'd take 1 second and 10 milliseconds. (one seek, and one second's sequential reading)

if you've got it split into 200 separate files, you'd have a minimum of 200 seeks. 200 * 10 ms = 2 seconds. And on top of that, we have the actual data to read, which is another second, so 3 seconds all in all.

Of course, all this is ridiculously simplified, and using mostly made-up values. Harddrive performance depends on a handful other factors I didn't mention. It also doesn't take into account any other overhead there may be in asking the OS to open a file and so on, and it assumes that you don't do any seeks in the big rolled-up uberfile.

But generally, if you need to read a big chunk of data, and you don't mind reading it all sequentially, then yes, you may save a bit of time by keeping it together in one file.

If your reads are going to be scattered all over the file *anyway*, there won't be much point.
Thanks for the responses. It looks for my project this will be a good idea since I will have lots of smaller files (~10kb or so). Even if I have 1000 of them (which I most definitely won't) it wouldn't be a big deal if I had them all in memory since they would only take up 10mb. Since the time to read them off disk will be mostly comprised of the seek time, it seems like a good idea. Thanks again.
scottrick49
Under Linux, it depends on the file system used.

Under Windows, the difference tends to be huge when dealing with 1000-10k files. Just zipping them tends to solve the problem. One of most recent issues related to that is the Vista copying problems.

YMMV, it also depends slightly on the system, but I've found this issue to be pretty consistent regardless of version.

This topic is closed to new replies.

Advertisement