• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Tape_Worm

File I/O streaming performance.

12 posts in this topic

I have a Windows application in C++ here that's got a very large file to stream in. The file is ~5GB in size, and currently the code does this:[list=1]
[*]Open file if not already open.
[*]Read in a block of data (~9 MB)
[*]Copy to volume texture
[*]Render
[*]Repeat...
[/list]
The file access is sequential and called at the end of every frame and currently I'm using the CRT file I/O stuff (fopen, fseek, fread, and fclose). And I'm wondering if there's a better way with regard to performance?

Would using memory mapped files be a good idea here? I've read conflicting statements about performance when it comes to reading a file sequentially.

I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.

Obviously, I can't read in the entire file (needs to run on 32 bit, that'd kill my process space quickly) and because of the environment I have to use (I really have no choice regarding this) I can't use threading.

Thanks. Edited by Tape_Worm
0

Share this post


Link to post
Share on other sites
[quote name='achild' timestamp='1347898276' post='4980924']
Use mem mapping. Make sure to use FILE_FLAG_SEQUENTIAL_SCAN. Experiment with FILE_FLAG_NO_BUFFERING. Depending on how windows caches your data, it may keep loading your 5gb file as you read it into virtual memory, only to eventually have to flush and stall your system for a [b]very long time[/b]. Using no buffering may seem slower overall, but it avoids this very bad annoying thing.

Make sure all your alignment stuff is good. In my experience sequential access has been [b]many times [/b]faster than random access. Can't imagine why it would be otherwise anyway, though that obviously doesn't mean there aren't cases where it happens.

(Of course if you can slip an ssd drive into this person's computer things will be much faster.)

[edit] Oh yeah... you might (or might not) get better performance by always reading a buffer ahead - it won't do most of the time consuming (hard drive access) stuff until you read from memory-mapped memory anyway, but at least windows gets a chance to know what's coming.
[/quote]

Thanks, I'll give memory mapping a shot.
0

Share this post


Link to post
Share on other sites
It may also be worth the effort to look into using overlapped I/O to asynchronously read blocks of the file before you need them. It saves you from the mess that is threaded-I/O, and should be more efficient into the bargain.
2

Share this post


Link to post
Share on other sites
[quote]I've considered loading a larger chunk of the file (i.e. multiple volume textures) in one shot, but I'm thinking that it'll hit a bottleneck when it's used up those textures and has to read in the next chunk.[/quote]

Use a circular/ring buffer with 2 or more chunks (chunks are fixed size, so do not allocate dynamically). When you have finished loading a chunk, immediately request your async I/O to load the next (if you have 3 chunks in your buffer, this "next" would be the fourth chunk). This way, you always have the current chunk and only loading chunks needed in the future. You might only need 2 chunks (the current and the next) if it takes less than a frame to load a chunk, though I would select 3 chunks if it takes more than half a frame or so (to avoid sudden hiccups if I/O stalls appear).

This buffering method is not meant to replace memory mapping and other I/O hints. It merely avoids problems with most stalls due to I/O inconsistencies by internal buffering.
0

Share this post


Link to post
Share on other sites
My understanding is he's on a machine with a single single-core processor. The async thing is a bit less predictable here, especially if it is going through virtual memory, [b]especially[/b] if he's locked on a 32-bit machine/os. Because the data is 5gb, there are risks to anything that gives him these kinds of speed gains because windows typically uses its virtual memory system to give them - 2 or 3gb into his 3d playback there is likely going to be a significant (read: more than 60 seconds) stall while windows basically rearranges virtual memory for his entire running system.

Buffering 1 "frame" ahead of time is a win. But it doesn't matter if he can't do it on another thread - and whatever the reason is - he said he can't multithread. So IO stalls are going to directly be part of his frame time. Edited by achild
0

Share this post


Link to post
Share on other sites
[quote name='Adam_42' timestamp='1347914312' post='4981009']
You should be able to gain some more speed by applying a lossless compression algorithm to the data. Image data tends to compress fairly well.

If you're lucky with the data you'll be able to fit the entire compressed file in memory so you'll only be doing decompression instead of I/O (link with /LARGEADDRESSAWARE to get 4GB of address space when run under 64-bit Windows). Even if that doesn't happen it'll mean you can trade off file reading for decompression code, which should be quicker (although it'll be close on an SSD, especially with no threading or async I/O).

If you can get away with lossy compression then DXT1 is a decent option - the video card can decode it directly and the compression ratio will be good enough to fit all the data into RAM.
[/quote]
Right! Totally forgot about this since where I work image compression is strictly prohibited. It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.
0

Share this post


Link to post
Share on other sites
[quote name='achild' timestamp='1347923198' post='4981055']
It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.
[/quote]
[img]http://public.gamedev.net//public/style_emoticons/default/huh.png[/img] I'm not sure why you think that. There surely exists compression schemes such that the decompressed image is completely faithful to the original one.
0

Share this post


Link to post
Share on other sites
[quote name='apatriarca' timestamp='1347975351' post='4981241']
[quote name='achild' timestamp='1347923198' post='4981055']
It's ... bad mojo ... to even mention compression. And the word "lossless" is like a lie. Seriously. It's very strange and counterproductive from my perspective.
[/quote]
[img]http://public.gamedev.net//public/style_emoticons/default/huh.png[/img] I'm not sure why you think that. There surely exists compression schemes such that the decompressed image is completely faithful to the original one.
[/quote]
Heh... I was referring to where I work. In a lot of areas in the microbiological field, there is this stigma against [i]any [/i]compression. I'm guessing someone with a lot of clout saw that jpeg compression distorted their data, and said "compression is bad" and now we're stuck here, unfortunately.
1

Share this post


Link to post
Share on other sites
That's incredibly short sighted of your employers.

Regardless, here's an update:
I tried a memory mapped file, and nothing good came of it (I probably implemented it wrong, but I was getting horrid results). So I tried just CreateFile with FILE_FLAG_SEQUENTIAL_SCAN and I didn't see any improvement.

However, one thing that did impro... Wait, wait. I'm getting ahead of myself. Let me tell you a story. You see, once upon a time there was an employee, a dashing and handsome employee of a company who inherited a real mess of a code base and was told to fix it up and optimize it. He toiled day and ... day... and started seeing results. However, he noticed his CPU would spike intermittently during rendering. He said "What the shit is this?? Might be the constant file I/O..." and so off he went to research the various forums and troll dwellings to find a more optimal solution. And so, here I am trying my damnedest to get this thing optimized from slide-show to interactive frame rates. I can't really go into detail about the project, nor can I say anything about why I'm so restricted (it's in the contract).

Anyway. That's the story.

So, what I noticed this morning was that the dude who was writing this before me was calling _fseeki64 before -every- read. Even on sequential reads. Now, maybe it's not that bad, but I did notice an improvement after setting it up to only seek on random access. And it's actually not too bad now.

Regardless, 5GB of data is too damned much, and I need to cull that down. So I'm thinking the compression idea is worth a shot. I considered using BC4 to compress it, but I found out that it requires the texture be a power of 2 (at least, I think that's what I read). And the data I have (and will continue to receive) will not be guaranteed to be in powers of 2. It'll be a monumental pain in the ass to resize a volume texture and encode it into BC4, so for now I'm trying out RLE compression (yeah, it's old, it's non-native, but it's fast enough and it appears to be working...?) As this is all 8 bit values in the volumes, it might be sufficient.

Anyway, that's where I am as of today. Edited by Tape_Worm
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0