How to avoid slow loading problem in games

Started by
34 comments, last by cr88192 11 years, 2 months ago

good loading times are hard.

for example, in my case (on a PC), loading textures is a good part of the startup time, and a lot of this due to resampling the (generally small) number of non-power-of-2 textures to be power-of-2 sizes. this is then followed by the inner loops for doing the inverse-filtering for PNG files (more so, the cycle-eating evilness of the Paeth filter, which wouldn't be as bad, except that this tends to be one of the most generally effective, and thus most-used, filters in PNG).

I sometimes also wish that PNG also had a simpler "A+B-C" linear filter, which can often compete well with Paeth, but, more notably, is much cheaper (it could improve performance by potentially dilluting the number of times the encoder picks the Paeth filter). granted, along similar lines, one can also wish that PNG filtered per-block rather than per-scanline, ... but alas. (nevermind all the pros and cons of using custom image formats for textures...).

note that even as such, decoding such a texture may still be faster than loading a simpler uncompressed texture would be (such as BMP or TGA), due mostly to the time it requires to read files from the HDD.

luckily, one can usually cache textures.
unluckily, you still have to load them the first time they are seen.

Why would you have any data in non optimal dimensions, and why would you even bother loading any of those formats? This stuff should be done once, and only once. Write a tool that goes over your assets folder and does any needed conversions.

Take all your files, and convert them to proper resolution (better yet, beat your artist with a blunt object until he makes proper power of 2 images. Images look bad when you change their aspect ratio), and then turn them into compressed DDS files. You're writing a game, not an image editor, it doesn't need to know about PNG or TGA or anything else.

Then when your game runs, dump your compressed DDS directly into VRAM. You don't need to check their dimensions. You don't need to uncompress or convert anything. Just dump it directly. DDS also handles mip levels, which covers your next paragraph.

When the game is running, everything should already be ready to go. Nothing should ever have to be converted or processed on the fly. Keep a good copy of everything for editing purposes, but make sure it's all converted when you build your game. Do a simple date comparison, if any asset is newer than the last converted one, convert it then.

power-of-2 is good, but mandating power of 2 isn't always itself optimal from a content-creation POV. better IMO to accept whatever, and resample it as needed, but generally as a matter of policy try to keep everything power-of-2 (to avoid resampling).

a lot of the ugliness caused by resampling can be reduced by using bicubic interpolation for the texture resampler, but granted, doing this is slow (somewhat slower than bilinear or nearest interpolation).

the problem with DDS is that it chews though a lot more HDD space than PNG or JPEG, and the potential for faster loading is largely hindered by the additional disk IO required (still better than BMP or TGA, which is just wasting HDD space). like, some saved clock-cycles doesn't really offset a slow HDD all that well (as well as still leaving download times longer for end-users, ...).

JPEG tends to gives the smallest files, but being lossy tends to result in lower image quality, and (in its common form) does not support alpha blending.

PNG is more of a compromise.

granted, I have a custom format (roughly based on JPEG, but with more features), but it has its own drawbacks (mostly that it is non-standard).

the Paeth issue more has to do with decoding PNG images, where essentially one ends up with several conditionals per pixel component (3 or 4 times per pixel):

given it is needed to do this about 1M times for a 512x512 image, and conditionals are fairly slow, this isn't free (especially since the branch-predictor can't really accurately predict them).

a direct linear predictor (a+b-c) usually does almost as well as Paeth, but is cheaper (but, sadly, not supported by PNG).

an encoder which then avoids Paeth ends up having to use a generally much less effective filter (resulting in bigger compressed images).

ironically, this issue is sufficiently bad that it manages to (almost single handedly) make decoding PNG images currently slower than decoding JPEG images.

but, alas...

Advertisement

As Daark hinted at, being nice to all kind of mangled formats/dimensions is useful during production, but once you hit the finish line, store the stuff in the exact format you need in memory.

Simple block reading is the best you can do for loading times.

Depending on the target system you may have to make compromises though (file size vs. loading speed).

Fruny: Ftagn! Ia! Ia! std::time_put_byname! Mglui naflftagn std::codecvt eY'ha-nthlei!,char,mbstate_t>

cr88192, this is all complete nonsense. Please don't get defensive. Instead of trying to be "right", learn from your mistakes and become better at your craft.

HDD space is a near infinite resource these days, and this thread is about in-game loading times. No one is going to miss the few extra bytes a DDS may use over another format.

Every optimization has a drawback. You always use more of something else to achieve your goal of using less of whatever the bottleneck is.

Just like mipmaps take up 33% more space per texture. It's not that expensive, and it solves the problem very well.

We used to use more memory to create look up tables to avoid expensive computations. Now we often do computations over again to avoid cache look-up misses.

Disc based games often game data in ways that a whole section can be streamed in at once, instead of storing it in a way that takes up less disc space. It doesn't hurt anything to use up all the space, and the loading times can improve dramatically.

GPUs are built to use DDS textures. They are supported "as is", both to save space, and to improve efficiency when they are being passed around. You get quicker texture loading (remember, this is a quicker loading thread!), less VRAM usage, and better bandwidth usage all for free.

power-of-2 is good, but mandating power of 2 isn't always itself optimal from a content-creation POV. better IMO to accept whatever, and resample it as needed, but generally as a matter of policy try to keep everything power-of-2 (to avoid resampling).

Every piece of art ever created had restrictions on it's dimensions. Be it 512x512, 8.5x11 (A1 legal), the size of a poster board, the size of a wall in a cave, etc... If size of 2 doesn't fit the object to be textured, then you work on it in whatever dimensions you want WITHIN a power of 2 canvas.

The art can be whatever dimensions it wants. The canvas MUST be power of 2.

a lot of the ugliness caused by resampling can be reduced by using bicubic interpolation for the texture resampler, but granted, doing this is slow (somewhat slower than bilinear or nearest interpolation).

Changing the aspect ratio of an image destroys it in many ways. The dimensions of a piece, which are very important to establish it's look and feel become distorted. Wide things become thin, thin things become wide, curves get straightened out, or go from being short, to long and vise-versa. This isn't a programming issue, so you can't program your way out of it. It doesn't matter what filter or technique you use. Changing the aspect ratio of an image completely changes it.

As someone mentioned the UMD has particularly poor seek times, additionally if you don't read any data from the UMD for awhile, the disk stops spinning (to save battery life), when you start reading again there is a significant delay as it spins back up to speed. It's possible the developers of the game you cite made a decision to let the UMD spin down to save your battery life, OTOH, it could just be inefficient.

Here is a decent article about how to organise the data you load so that there is a minimal amount of CPU work to be done on load - http://www.gamasutra.com/view/feature/132376/delicious_data_baking.php (the TLDR version is, 1. Load your data in place, 2. Fixup the pointers. i.e. it won't be anything new to anyone who has optimised loading time or the CPU footprint of streaming loading on a console).

Changing the aspect ratio of an image destroys it in many ways. The dimensions of a piece, which are very important to establish it's look and feel become distorted. Wide things become thin, thin things become wide, curves get straightened out, or go from being short, to long and vise-versa. This isn't a programming issue, so you can't program your way out of it. It doesn't matter what filter or technique you use. Changing the aspect ratio of an image completely changes it.


The aspect ratio of a texture in the game depends on what it is mapped on to. Changing the size of the texture won't affect that, unless the mapping is generated dynamically using the texture size.

Not to take away from the main point that artists should be authoring textures in power of two sizes in the first place.

cr88192, this is all complete nonsense. Please don't get defensive. Instead of trying to be "right", learn from your mistakes and become better at your craft.

I am writing mostly from personal experience, which tends to be that HDD's aren't very fast.

while there is often plenty of space on an HDD, its read/write speeds are not necessarily all that fast.

HDD space is a near infinite resource these days, and this thread is about in-game loading times. No one is going to miss the few extra bytes a DDS may use over another format.

Every optimization has a drawback. You always use more of something else to achieve your goal of using less of whatever the bottleneck is.

Just like mipmaps take up 33% more space per texture. It's not that expensive, and it solves the problem very well.

We used to use more memory to create look up tables to avoid expensive computations. Now we often do computations over again to avoid cache look-up misses.

Disc based games often game data in ways that a whole section can be streamed in at once, instead of storing it in a way that takes up less disc space. It doesn't hurt anything to use up all the space, and the loading times can improve dramatically.

the problem is, IME, loading times are often largely IO-bound, and often a DDS texture will take about 2x-4x as much space as a PNG (though, a DDS is still about 1/4 to 1/3 the size of a raw BMP).


if it takes 2.6s for the disk-time to read these files, but at 1/2 the size takes 1.3s, unless decoding requires more than 1.3s, it is still a net saving.

hence, a strategy of both organizing data for streaming, and try to make the data to be streamed smaller.

hence why a person may also use *deflate* to make disk-IO faster.
decoding deflated data isn't free, but still much faster than waiting for the data to be read-in otherwise.


the issue is where one reaches a point of "diminishing returns", namely where decoding the data would outweigh the IO-time savings.


Changing the aspect ratio of an image destroys it in many ways. The dimensions of a piece, which are very important to establish it's look and feel become distorted. Wide things become thin, thin things become wide, curves get straightened out, or go from being short, to long and vise-versa. This isn't a programming issue, so you can't program your way out of it. It doesn't matter what filter or technique you use. Changing the aspect ratio of an image completely changes it.

The aspect ratio of a texture in the game depends on what it is mapped on to. Changing the size of the texture won't affect that, unless the mapping is generated dynamically using the texture size.

Not to take away from the main point that artists should be authoring textures in power of two sizes in the first place.


yes, agreed.
using power-of-2 is best, just no particular reason it should be mandatory for the loader, except in special cases (for example, my code for streaming video-streams into textures does mandate a power-of-2 size, and a specific set of codecs).

in most of the cases, where the size is already correct, then no resampling is needed.

warning about non-power-of-2 textures may be a good idea though, such that any which are found can be hopefully resized to the correct size outside the game (be it complaining to the artist or whatever, so that they can go and fix it).


as for aspect/etc:
typically, only the images' "virtual size" is really needed for calculating texture coordinates, which may be largely independent of its actual resolution.

if a non-power-of-2 texture is encountered, it will be resampled internally, but its virtual size will remain unchanged.
the main issue is the possibility of visual distortion introduced by resampling, which isn't usually too much of an issue (though, yes, it is still better if resampling isn't needed in the first place).
Regarding texture loading:

In most cases you should be using one of the block compressed formats, as they use less GPU memory and save bandwidth. But compression is slow, so it wouldn't make sense to decompress from something like jpeg then re-compress, even though jpeg is smaller. Instead, block compress the texture first, offline, then compress that (e.g. with Zlib). This will give around another 50% saving, and the only work to do on loading is decompression, direct to the final format.

There are specialised lossy compression schemes for block compressed textures that can do even better, getting close to jpeg levels.
It must be nice living in a world where HDDs have essentially an unlimited amount of space. At work I have 3 HDDs (one of which is a terabyte) and am constantly struggling to find space anytime a lot of assets are added. Anyways, on to something actually relevant... In my personal experience with the PSP (I worked on several PSP games years ago) the disk seek times are slow (as others have said). The main trick in dealing with this is to layout your data on the disk in a way that reduces these seek times. To do that you want data that is loaded together to be together on the disk so the UMD doesn't have to seek back and forth to read data. Another trick is to duplicate data in multiple places on the disk if it is loaded in different places. This allows you to seek less to read the same data. The problem with that approach is that UMD capacity is limited, so it only works if you have the space to play around with. Some games are easier to handle than others (for instance games that load on a per-level basis would be easier to manage than something that stream loads or loads when entering new areas (provided the areas are non-linear).

Regarding texture loading:

In most cases you should be using one of the block compressed formats, as they use less GPU memory and save bandwidth. But compression is slow, so it wouldn't make sense to decompress from something like jpeg then re-compress, even though jpeg is smaller. Instead, block compress the texture first, offline, then compress that (e.g. with Zlib). This will give around another 50% saving, and the only work to do on loading is decompression, direct to the final format.

There are specialised lossy compression schemes for block compressed textures that can do even better, getting close to jpeg levels.

yeah, DDS+deflate is possible I guess, and would come by default if a DDS is packaged up in a ZIP.


I was able to shave off a bit little more from the loading time, mostly by manually resampling some of the non-power-of-2 textures, and also thinking about it and finding a trick to eliminate the conditionals from the Paeth filter regarding PNG loading (realizing that there was a pure integer-math solution). current engine startup time is ~3s, and getting the world loaded up is around 15s.

most of the loading-related time-usage (according to profiler) is still within "ntoskrnl.exe" though.
other observations are that in both loading events, the HD light turns solid red for a few short bursts.

resource monitor: during loading, the engines read-IO spikes at around 50MB/s.
during gameplay has continued IO of around 1MB/s to the voxel-terrain (region files), but most of the CPU usage is related to voxel-related stuff (followed by "ntoskrnl.exe" and "nvoglv32.dll").

This topic is closed to new replies.

Advertisement