good loading times are hard.
for example, in my case (on a PC), loading textures is a good part of the startup time, and a lot of this due to resampling the (generally small) number of non-power-of-2 textures to be power-of-2 sizes. this is then followed by the inner loops for doing the inverse-filtering for PNG files (more so, the cycle-eating evilness of the Paeth filter, which wouldn't be as bad, except that this tends to be one of the most generally effective, and thus most-used, filters in PNG).
I sometimes also wish that PNG also had a simpler "A+B-C" linear filter, which can often compete well with Paeth, but, more notably, is much cheaper (it could improve performance by potentially dilluting the number of times the encoder picks the Paeth filter). granted, along similar lines, one can also wish that PNG filtered per-block rather than per-scanline, ... but alas. (nevermind all the pros and cons of using custom image formats for textures...).
note that even as such, decoding such a texture may still be faster than loading a simpler uncompressed texture would be (such as BMP or TGA), due mostly to the time it requires to read files from the HDD.
luckily, one can usually cache textures.
unluckily, you still have to load them the first time they are seen.
there is a trick though, namely to have alternate lower-resolution and high-resolution versions of textures.
initially, only the low-resolution versions are loaded, and then any high-resolution and extended-component textures (normal-maps, ...) are streamed in during play.
clever streaming = using a thread. in my case (poor-man's streaming), it often means using a timer, meaning that a certain number of milliseconds can be used doing whatever (if too much time has gone by, we abort and wait until later). granted, for longer operations, this lazy option can still result in chopiness (mostly due to the potentially high number of milliseconds loading a texture can require, potentially going somewhat over the millisecond budget).
if graphical settings are set low, also the high-resolution versions may also be skipped entirely.
sometimes, loading may be hindered by other things, such as a few common offenders:
parsing text files;
linear lookups.
linear lookups are extra bad, as they can turn loading from an an O(n) operation into an O(n^2) operation.
IME, linear lookups have more often been a bigger problem than traditional parsing tasks, like reading off tokens or decoding numbers.
matching strings (such as for command-names) has sometimes been an issue, but shares a common solution with that of the linear lookup problem: hashing.
say, for example:
read in token (or read in line and split into tokens);
use a hash-based lookup, mapping the token to a command-ID number or similar;
"switch()".
a bigger problem with text formats is more often not actually the parsing, but rather reading them in from disk.
basically, for reading lots of small files, the OS's filesystem is often your enemy.
potentially better is, when possible, to bundle them into an archive, such as a ZIP, then fetch the files from this.
if implemented well, reading the contents from a ZIP archive can actually outperform reading them via normal file-IO (both due to bundling, and also reducing total disk IO via storing the data in a deflated form).
a downside though is that there are planty of braindamaged and stupid ways to handle this as well.
(if not implemented stupidly, it is possible to get good random access speeds to a ZIP archive with 100k+ files, but if implemented stupidly, so help you...).
granted, due to the way ZIP is designed, the above may still require the initial up-front cost of reading-in the central directory and transforming it into a more efficient directory tree structure (for example, large flat lists of directory-entry nodes, internally linked into a heirarchical tree structure, such as to allow the directory tree to be more efficiently "descended into", like in the OS filesystem).
during development, a disadvantage of ZIP though is that it can't be readily accessed by the OS or by "normal" apps (such as graphics editors, ...), so is more something for "final deployment". often these will be given a special extension (such as "pk" or "pk3" or similar) to reduce the likelihood of a naive user extracting them.
for some other specialized use cases, I am using a loosely WAD-based format. it is also linked into a heirarchy, albeit less efficiently (via simply linking to the parent directory entry) mostly to save space (ideally, we also want a 'next' link, but for A vs B, the parent-link won vs the next-link for the use-case). like ZIP, contents are usually deflated. this can avoid some of the up-front cost (but, with their own costs).
but, even with everything, getting everything loaded up in a few seconds or less isn't really an easy task (with modern expectations for content).
loading time has to come somewhere, either at engine startup, at world loading, or during gameplay.
typically, compromises are made.
not to say though that some games don't just have bad loading code...
> void hurrrrrrrr() {__asm sub [ebp+4],5;}
LOLz
(quickly going nowhere fast...).