Reducing Resource loading time in Directx Physx car game

Started by
13 comments, last by frob 8 years, 10 months ago

Asset loading is a complex issue.

It shouldn't be, after all you are just loading "blobs" of data, but it is.

The way we all work now is by loading the minimum we have to to get something on screen, then split loading up over time and spread it across threads.

Typically we write a tool that segregates data into sections. Typically you have a blob of data that always needs to be loaded, blobs of data that need to be loaded when the player does this, blobs of data that only need to be loaded when a player does that.

Then you stream in the assets as the game is running.

It can get horribly complex, but it's really important to the finished product.

In the short term I would look at converting the assets into binary lumps that can be loaded as fast as the OS can handle it.

Also look at changing your texture loader so it only loads the lowest level mip on the main thread. The rest of the mips get loaded on a background thread.

This way you can get something on screen quickly and the quality improves over time. (You see this a lot on commercial games)

Hope that gives you a few ideas to play with.

Advertisement

My immediate plan of action :-
1. Understand the loading time division - how much for models & how much for textures

...

That plan isn't too bad at this point - particularly the concentration on the model files, rather than the textures (IMHO). I.e., DDS textures are nicely supported in DirectX.

However, I think you missed an area that should be looked at:


... The loading is being done by our own importer...

IF you choose to continue with binary X files, DX9 provides built-in support for importing X files (both text and binary). I think it's highly unlikely you can write faster routines to load binary (and definitely text) format X files, parse them and create models.

Please don't PM me with questions. Post them in the forums for everyone's benefit, and I can embarrass myself publicly.

You don't forget how to play when you grow old; you grow old when you forget how to play.


My immediate plan of action :-

1. Understand the loading time division - how much for models & how much for textures
2. For models try exporting in binary compressed X Files (sample tried today reduced the size from 10 MB to 2 MB)
3. Try and merge all models into a single file and load only one or two zipped files at the time of loading.

Good plan.

#1 is absolutely essential. It will likely identify what Buckeye suggested, that parsing and processing is slow. It may also reveal many other useful facts about how your system works.

#2 is good. Reducing the volume of data by 80% will certainly help with the data transfer aspect. But it is only one piece of the puzzle. While it likely isn't the biggest problem, it is probably a contributing factor.

#3 is only partially good. I would hold off on this until you've done the other two. It is quite common to have a final asset builder that rebuilds the data into a format that can be loaded directly into memory without any parsing. The engine then has two variants, one that loads standard files and a second loader that works with the in-memory layout. Combining the files is often done as part of the final asset building, so you may not want to build it as a separate tool right now.

I have tried to do a comprehensive study of the loading times used by my resources:-

- Original Total loading time was 168 secs

Step 1:

- All meshes have been converted to binary compressed

- Default X File loaders are being used for loading

- New Total Loading time is 112 sec

Now out of 112 secs

- Meshes are taking 22 seconds [ Total 238 meshes with total size 138 MB]

- Textures are taking 90 seconds [ Total 900 textures with total size 1056 MB]

- 820 MB [509 files] of the above textures are in DDS format , texture size is power of 2, and is DXT5

- 242 MB [392 files] of the above textures are in PNG format , where texture size is not power of 2 and is random

Also, in the above 820 MB files, I am using 6 Cube Textures in DDS format with total size 20MB, and these 6 cube textures are taking about 13 seconds to load.

I am trying out some more experiments with the data and will update soon.

Your texture loading seems to be really slow. Just checked that in Hardland we have 1169 texture files and these load in about one second with old laptop without SSD. For most of textures I just load small mips but I still open every single file.

Are you allocating lot of memory? Are you copying texture data multiple times? What part of texture loading use most of the time?

Loading PNG takes, of course, noticeable time since it needs to un-LZ77 the data and to undo the complicated permutation / pixel twiddling that PNG does to achieve high compression ratios. Which makes it a roughly 50 MB/s operation rather than the 500 MB/s that your SSD could deliver. Certainly doing that chore in one or several worker threads while the I/O thread keeps loading the rest would help a lot, not only because of more CPU power, but also because of running asynchronously.

Are you on-the-fly DXT-compressing the decompressed PNGs when handing them to DirectX? Having DirectX generate one or the other mipmap, too? That, will of course also add some small but noticeable overhead, and if you don't have I/O and conversion/upload going asynchronously, this can become a very visible stall since no I/O happens during that time.

But that still doesn't explain why it is soooo slow.

In particular the DXT5-encoded textures should, once they're physically loaded into RAM from disk, be ready more or less instantaneous. You can basically more or less throw the raw DDS directly at the graphics card, and it will eat it (and mipmaps are normally already present in that file format). Unless... unless... do you make sure that you do not have any "weird" or "wrong" properties with your textures (such as alignment, or bit depth) which forces the driver to convert pixel depth in software and copy/align/shuffle data all the time?


- Textures are taking 90 seconds [ Total 900 textures with total size 1056 MB]

- 820 MB [509 files] of the above textures are in DDS format , texture size is power of 2, and is DXT5
- 242 MB [392 files] of the above textures are in PNG format , where texture size is not power of 2 and is random

It would be better to not have a monolithic load for these, if you can design it to load them asynchronously at a less critical stage, the better for your game.

That said, let's check some things:

Are you loading the DDS images directly into a single location in memory using the fastest operations and a single call? That is, if you are reading and parsing the DDS images as you go along that is the wrong approach. As it is Windows, that means using asynchronous IO and callbacks. Open the files with CreateFile using the FILE_FLAG_OVERLAPPED and FILE_FLAG_SEQUENTIAL_SCAN flags, and use ReadFile to read the entire file into a single buffer at a time. Send all 509 file requests to the OS at once and it will intelligently take steps to minimize disk seeks, and pick up pieces of other files as the heads move across the platter. They'll each report back when the individual files are done.

If you do that, loading the 820 MB of textures should approach whatever the ideal transfer speed is from your disk. For a spindle drive that may be anywhere from 15 to 40 seconds. If instead your machine has a quality modern SSD, the time could be about three seconds.

For your 400 PNG files, the most direct answer is "Don't Do That!" DDS can be used directly from disk but PNG files need to be decoded, meaning you need time and space to process each one. Transform them as a build step if at all possible so you avoid the enormous runtime hit.

The exact decompress time will depends on the details of the image and the library.

If for some reason you absolutely must have PNG files rather than DDS files, parallelism is your friend. Make sure every processor is working. Depending on the libraries you are using that likely means multiple image decoders per virtual processor. (For strictly processor-bound tasks it is often best to have a 1:1 ratio, but that probably isn't what is taking place.) It will depend internally on how they are doing the work. If internally they are doing slow operations like blocking file reads and blocking memory tasks, you might consider going 2x, 3x, 4x or more per processor since they'll be spending so much time in those other non-compute operations.

If at all possible, use DDS files loaded directly into memory and not parsed.

This topic is closed to new replies.

Advertisement