Jump to content

  • Log In with Google      Sign In   
  • Create Account

File I/O streaming performance.


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
12 replies to this topic

#1 Tape_Worm   Crossbones+   -  Reputation: 1822

Like
0Likes
Like

Posted 17 September 2012 - 09:28 AM

I have a Windows application in C++ here that's got a very large file to stream in. The file is ~5GB in size, and currently the code does this:
  • Open file if not already open.
  • Read in a block of data (~9 MB)
  • Copy to volume texture
  • Render
  • Repeat...
The file access is sequential and called at the end of every frame and currently I'm using the CRT file I/O stuff (fopen, fseek, fread, and fclose). And I'm wondering if there's a better way with regard to performance?

Would using memory mapped files be a good idea here? I've read conflicting statements about performance when it comes to reading a file sequentially.

I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.

Obviously, I can't read in the entire file (needs to run on 32 bit, that'd kill my process space quickly) and because of the environment I have to use (I really have no choice regarding this) I can't use threading.

Thanks.

Edited by Tape_Worm, 17 September 2012 - 09:30 AM.


Sponsor:

#2 achild   Crossbones+   -  Reputation: 1941

Like
0Likes
Like

Posted 17 September 2012 - 09:54 AM

So is your target controlled, or are those just min. specs?

#3 Tape_Worm   Crossbones+   -  Reputation: 1822

Like
0Likes
Like

Posted 17 September 2012 - 10:00 AM

So is your target controlled, or are those just min. specs?

Controlled.

#4 achild   Crossbones+   -  Reputation: 1941

Like
3Likes
Like

Posted 17 September 2012 - 10:11 AM

Use mem mapping. Make sure to use FILE_FLAG_SEQUENTIAL_SCAN. Experiment with FILE_FLAG_NO_BUFFERING. Depending on how windows caches your data, it may keep loading your 5gb file as you read it into virtual memory, only to eventually have to flush and stall your system for a very long time. Using no buffering may seem slower overall, but it avoids this very bad annoying thing.

Make sure all your alignment stuff is good. In my experience sequential access has been many times faster than random access. Can't imagine why it would be otherwise anyway, though that obviously doesn't mean there aren't cases where it happens.

(Of course if you can slip an ssd drive into this person's computer things will be much faster.)

[edit] Oh yeah... you might (or might not) get better performance by always reading a buffer ahead - it won't do most of the time consuming (hard drive access) stuff until you read from memory-mapped memory anyway, but at least windows gets a chance to know what's coming.

Edited by achild, 17 September 2012 - 10:13 AM.


#5 Tape_Worm   Crossbones+   -  Reputation: 1822

Like
0Likes
Like

Posted 17 September 2012 - 10:38 AM

Use mem mapping. Make sure to use FILE_FLAG_SEQUENTIAL_SCAN. Experiment with FILE_FLAG_NO_BUFFERING. Depending on how windows caches your data, it may keep loading your 5gb file as you read it into virtual memory, only to eventually have to flush and stall your system for a very long time. Using no buffering may seem slower overall, but it avoids this very bad annoying thing.

Make sure all your alignment stuff is good. In my experience sequential access has been many times faster than random access. Can't imagine why it would be otherwise anyway, though that obviously doesn't mean there aren't cases where it happens.

(Of course if you can slip an ssd drive into this person's computer things will be much faster.)

[edit] Oh yeah... you might (or might not) get better performance by always reading a buffer ahead - it won't do most of the time consuming (hard drive access) stuff until you read from memory-mapped memory anyway, but at least windows gets a chance to know what's coming.


Thanks, I'll give memory mapping a shot.

#6 swiftcoder   Senior Moderators   -  Reputation: 10364

Like
2Likes
Like

Posted 17 September 2012 - 10:42 AM

It may also be worth the effort to look into using overlapped I/O to asynchronously read blocks of the file before you need them. It saves you from the mess that is threaded-I/O, and should be more efficient into the bargain.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#7 nife87   Members   -  Reputation: 516

Like
0Likes
Like

Posted 17 September 2012 - 10:49 AM

I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.


Use a circular/ring buffer with 2 or more chunks (chunks are fixed size, so do not allocate dynamically). When you have finished loading a chunk, immediately request your async I/O to load the next (if you have 3 chunks in your buffer, this "next" would be the fourth chunk). This way, you always have the current chunk and only loading chunks needed in the future. You might only need 2 chunks (the current and the next) if it takes less than a frame to load a chunk, though I would select 3 chunks if it takes more than half a frame or so (to avoid sudden hiccups if I/O stalls appear).

This buffering method is not meant to replace memory mapping and other I/O hints. It merely avoids problems with most stalls due to I/O inconsistencies by internal buffering.

#8 achild   Crossbones+   -  Reputation: 1941

Like
0Likes
Like

Posted 17 September 2012 - 11:37 AM

My understanding is he's on a machine with a single single-core processor. The async thing is a bit less predictable here, especially if it is going through virtual memory, especially if he's locked on a 32-bit machine/os. Because the data is 5gb, there are risks to anything that gives him these kinds of speed gains because windows typically uses its virtual memory system to give them - 2 or 3gb into his 3d playback there is likely going to be a significant (read: more than 60 seconds) stall while windows basically rearranges virtual memory for his entire running system.

Buffering 1 "frame" ahead of time is a win. But it doesn't matter if he can't do it on another thread - and whatever the reason is - he said he can't multithread. So IO stalls are going to directly be part of his frame time.

Edited by achild, 17 September 2012 - 11:39 AM.


#9 Adam_42   Crossbones+   -  Reputation: 2616

Like
4Likes
Like

Posted 17 September 2012 - 02:38 PM

You should be able to gain some more speed by applying a lossless compression algorithm to the data. Image data tends to compress fairly well.

If you're lucky with the data you'll be able to fit the entire compressed file in memory so you'll only be doing decompression instead of I/O (link with /LARGEADDRESSAWARE to get 4GB of address space when run under 64-bit Windows). Even if that doesn't happen it'll mean you can trade off file reading for decompression code, which should be quicker (although it'll be close on an SSD, especially with no threading or async I/O).

If you can get away with lossy compression then DXT1 is a decent option - the video card can decode it directly and the compression ratio will be good enough to fit all the data into RAM.

#10 achild   Crossbones+   -  Reputation: 1941

Like
0Likes
Like

Posted 17 September 2012 - 05:06 PM

You should be able to gain some more speed by applying a lossless compression algorithm to the data. Image data tends to compress fairly well.

If you're lucky with the data you'll be able to fit the entire compressed file in memory so you'll only be doing decompression instead of I/O (link with /LARGEADDRESSAWARE to get 4GB of address space when run under 64-bit Windows). Even if that doesn't happen it'll mean you can trade off file reading for decompression code, which should be quicker (although it'll be close on an SSD, especially with no threading or async I/O).

If you can get away with lossy compression then DXT1 is a decent option - the video card can decode it directly and the compression ratio will be good enough to fit all the data into RAM.

Right! Totally forgot about this since where I work image compression is strictly prohibited. It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.

#11 apatriarca   Crossbones+   -  Reputation: 1772

Like
0Likes
Like

Posted 18 September 2012 - 07:35 AM

It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.

Posted Image I'm not sure why you think that. There surely exists compression schemes such that the decompressed image is completely faithful to the original one.

#12 achild   Crossbones+   -  Reputation: 1941

Like
1Likes
Like

Posted 18 September 2012 - 07:40 AM


It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.

Posted Image I'm not sure why you think that. There surely exists compression schemes such that the decompressed image is completely faithful to the original one.

Heh... I was referring to where I work. In a lot of areas in the microbiological field, there is this stigma against any compression. I'm guessing someone with a lot of clout saw that jpeg compression distorted their data, and said "compression is bad" and now we're stuck here, unfortunately.

#13 Tape_Worm   Crossbones+   -  Reputation: 1822

Like
1Likes
Like

Posted 18 September 2012 - 11:55 AM

That's incredibly short sighted of your employers.

Regardless, here's an update:
I tried a memory mapped file, and nothing good came of it (I probably implemented it wrong, but I was getting horrid results). So I tried just CreateFile with FILE_FLAG_SEQUENTIAL_SCAN and I didn't see any improvement.

However, one thing that did impro... Wait, wait. I'm getting ahead of myself. Let me tell you a story. You see, once upon a time there was an employee, a dashing and handsome employee of a company who inherited a real mess of a code base and was told to fix it up and optimize it. He toiled day and ... day... and started seeing results. However, he noticed his CPU would spike intermittently during rendering. He said "What the shit is this?? Might be the constant file I/O..." and so off he went to research the various forums and troll dwellings to find a more optimal solution. And so, here I am trying my damnedest to get this thing optimized from slide-show to interactive frame rates. I can't really go into detail about the project, nor can I say anything about why I'm so restricted (it's in the contract).

Anyway. That's the story.

So, what I noticed this morning was that the dude who was writing this before me was calling _fseeki64 before -every- read. Even on sequential reads. Now, maybe it's not that bad, but I did notice an improvement after setting it up to only seek on random access. And it's actually not too bad now.

Regardless, 5GB of data is too damned much, and I need to cull that down. So I'm thinking the compression idea is worth a shot. I considered using BC4 to compress it, but I found out that it requires the texture be a power of 2 (at least, I think that's what I read). And the data I have (and will continue to receive) will not be guaranteed to be in powers of 2. It'll be a monumental pain in the ass to resize a volume texture and encode it into BC4, so for now I'm trying out RLE compression (yeah, it's old, it's non-native, but it's fast enough and it appears to be working...?) As this is all 8 bit values in the volumes, it might be sufficient.

Anyway, that's where I am as of today.

Edited by Tape_Worm, 18 September 2012 - 11:57 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS