File I/O streaming performance.

Started by
11 comments, last by Tape_Worm 11 years, 7 months ago
I have a Windows application in C++ here that's got a very large file to stream in. The file is ~5GB in size, and currently the code does this:

  1. Open file if not already open.
  2. Read in a block of data (~9 MB)
  3. Copy to volume texture
  4. Render
  5. Repeat...

The file access is sequential and called at the end of every frame and currently I'm using the CRT file I/O stuff (fopen, fseek, fread, and fclose). And I'm wondering if there's a better way with regard to performance?

Would using memory mapped files be a good idea here? I've read conflicting statements about performance when it comes to reading a file sequentially.

I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.

Obviously, I can't read in the entire file (needs to run on 32 bit, that'd kill my process space quickly) and because of the environment I have to use (I really have no choice regarding this) I can't use threading.

Thanks.
Advertisement
So is your target controlled, or are those just min. specs?

So is your target controlled, or are those just min. specs?

Controlled.
Use mem mapping. Make sure to use FILE_FLAG_SEQUENTIAL_SCAN. Experiment with FILE_FLAG_NO_BUFFERING. Depending on how windows caches your data, it may keep loading your 5gb file as you read it into virtual memory, only to eventually have to flush and stall your system for a very long time. Using no buffering may seem slower overall, but it avoids this very bad annoying thing.

Make sure all your alignment stuff is good. In my experience sequential access has been many times faster than random access. Can't imagine why it would be otherwise anyway, though that obviously doesn't mean there aren't cases where it happens.

(Of course if you can slip an ssd drive into this person's computer things will be much faster.)

[edit] Oh yeah... you might (or might not) get better performance by always reading a buffer ahead - it won't do most of the time consuming (hard drive access) stuff until you read from memory-mapped memory anyway, but at least windows gets a chance to know what's coming.

Use mem mapping. Make sure to use FILE_FLAG_SEQUENTIAL_SCAN. Experiment with FILE_FLAG_NO_BUFFERING. Depending on how windows caches your data, it may keep loading your 5gb file as you read it into virtual memory, only to eventually have to flush and stall your system for a very long time. Using no buffering may seem slower overall, but it avoids this very bad annoying thing.

Make sure all your alignment stuff is good. In my experience sequential access has been many times faster than random access. Can't imagine why it would be otherwise anyway, though that obviously doesn't mean there aren't cases where it happens.

(Of course if you can slip an ssd drive into this person's computer things will be much faster.)

[edit] Oh yeah... you might (or might not) get better performance by always reading a buffer ahead - it won't do most of the time consuming (hard drive access) stuff until you read from memory-mapped memory anyway, but at least windows gets a chance to know what's coming.


Thanks, I'll give memory mapping a shot.
It may also be worth the effort to look into using overlapped I/O to asynchronously read blocks of the file before you need them. It saves you from the mess that is threaded-I/O, and should be more efficient into the bargain.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.[/quote]

Use a circular/ring buffer with 2 or more chunks (chunks are fixed size, so do not allocate dynamically). When you have finished loading a chunk, immediately request your async I/O to load the next (if you have 3 chunks in your buffer, this "next" would be the fourth chunk). This way, you always have the current chunk and only loading chunks needed in the future. You might only need 2 chunks (the current and the next) if it takes less than a frame to load a chunk, though I would select 3 chunks if it takes more than half a frame or so (to avoid sudden hiccups if I/O stalls appear).

This buffering method is not meant to replace memory mapping and other I/O hints. It merely avoids problems with most stalls due to I/O inconsistencies by internal buffering.
My understanding is he's on a machine with a single single-core processor. The async thing is a bit less predictable here, especially if it is going through virtual memory, especially if he's locked on a 32-bit machine/os. Because the data is 5gb, there are risks to anything that gives him these kinds of speed gains because windows typically uses its virtual memory system to give them - 2 or 3gb into his 3d playback there is likely going to be a significant (read: more than 60 seconds) stall while windows basically rearranges virtual memory for his entire running system.

Buffering 1 "frame" ahead of time is a win. But it doesn't matter if he can't do it on another thread - and whatever the reason is - he said he can't multithread. So IO stalls are going to directly be part of his frame time.
You should be able to gain some more speed by applying a lossless compression algorithm to the data. Image data tends to compress fairly well.

If you're lucky with the data you'll be able to fit the entire compressed file in memory so you'll only be doing decompression instead of I/O (link with /LARGEADDRESSAWARE to get 4GB of address space when run under 64-bit Windows). Even if that doesn't happen it'll mean you can trade off file reading for decompression code, which should be quicker (although it'll be close on an SSD, especially with no threading or async I/O).

If you can get away with lossy compression then DXT1 is a decent option - the video card can decode it directly and the compression ratio will be good enough to fit all the data into RAM.

You should be able to gain some more speed by applying a lossless compression algorithm to the data. Image data tends to compress fairly well.

If you're lucky with the data you'll be able to fit the entire compressed file in memory so you'll only be doing decompression instead of I/O (link with /LARGEADDRESSAWARE to get 4GB of address space when run under 64-bit Windows). Even if that doesn't happen it'll mean you can trade off file reading for decompression code, which should be quicker (although it'll be close on an SSD, especially with no threading or async I/O).

If you can get away with lossy compression then DXT1 is a decent option - the video card can decode it directly and the compression ratio will be good enough to fit all the data into RAM.

Right! Totally forgot about this since where I work image compression is strictly prohibited. It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.

This topic is closed to new replies.

Advertisement