Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 14 Feb 2007
Online Last Active Today, 05:00 PM

#5028500 How to avoid slow loading problem in games

Posted by Hodgman on 03 February 2013 - 07:27 PM

block compress the texture first, offline, then compress that (e.g. with Zlib). This will give around another 50% saving, and the only work to do on loading is decompression, direct to the final format.

Indeed, this is standard practice these days.
Instead of using standard compression on them though, another option is the crunch library, which offers two options --
* an R/D optimised DXT compressor, which reduces quality slightly, but produces files that can be much better compressed by standard compression algorithms.
* it's own compressed format "CRN", which is also a lossy block based format, but has the ability to be directly (and efficiently) transcoded from CRN to DXT, for small on-disk sizes and fast loading.

My other question would be I'm sure the programmers know about the slow loading times but why not fix it?

Fixing things takes time, and time is money... which makes that a question for the business managers, not the engineers tongue.png

or example, in my case (on a PC), loading textures is a good part of the startup time, and a lot of this due to resampling the (generally small) number of non-power-of-2 textures to be power-of-2 sizes. this is then followed by the inner loops for doing the inverse-filtering for PNG files
parsing text files

Just say no! Don't perform any image filtering/re-sampling/transcoding or parsing at load-time; move that work to build-time!
As phantom mentioned, DXT compression is very slow, so if you want fast texture fetching and low VRAM usage, then you'll also be wasting a lot of load-time recompressing the image data that you just decompressed from PNG too!

during development, a disadvantage of ZIP though is that it can't be readily accessed by the OS or by "normal" apps

The past 3 engines I've used, we've used ZIP-like archives for final builds, and just loose files in the OS's file-system for development builds, because building/editing the huge archive files is slow.

However, the above issue (that your content tools can't write to your archive directly) isn't actually an issue, because even when we're using the OS's file-system, the content tools can't write to those files either, because they've been compiled into runtime-efficient formats!

The data flow looks something like:
[Content Tools] --> Content Source Repository  --> [Build tools] --> Data directory --> [Build tools] --> Archive file
                                                                            |                                  |
                                                                           \|/                                \|/
                                                                    In-Development game                    Retail game
Just how we don't manually compile our code any more -- everyone uses an IDE or at least a makefile -- you should also be using an automated system for building the data that goes into your game. The 3 engine that I mentioned above all used a workflow similar to the diagram, where when an artist saves a new/edited "source" art file, the build system automatically compiles that file and updates the data directory and/or the "ZIP archive".

For example, if someone saves out a NPOT PNG file, the build tools will automatically load, decode, filter, resample that data, then compress it using an expensive DXT compression algorithm, then save it in the platform specific format (e.g. DDS) in the data directory for the game to use. Then at load-time, the game has no work to do, besides streaming in the data.

#5028493 Multiply blending with exception of the alpha channel?

Posted by Hodgman on 03 February 2013 - 06:47 PM

You could try turning on alpha testing to discard the 'transparent' areas.

I guess a nice thing to have here would be a per-channel glBlendFunc.

By default you get one function for RGBA, but with glBlendFuncSeparate you get one for RGB and one for A... which doesn't actually help you, because what you want is the RGB blend function to take into account 3 factors -- src.RGB, src.A and dst.RGB -- which as mentioned by C0lumbo, would have to be done partially in the pixel shader.

(0.00, 0.00, 0.00, 0.00) // - which as i see it should be transparent. It comes out black, however.

This is the pixel value that's being written into the frame-buffer, not one that's being blended with the frame-buffer value (you already blended something with that value - 'D' - and are now writing out the final result).

Src. (What I draw to the colour buffer)

That picture is a red herring. The checkerboard pattern tells us that the area is 'transparent', so hopefully that means the alpha channel in that area is 0... but what values do the RGB channels have in those areas? Most image editors don't let you know, which is fine for most digital art, like web-pages/etc, but isn't fine for games. As you're seeing, the RGB values of transparent pixels have a big effect on your code. If your image editor has a mode where you can edit an alpha channel without that checker-board pattern appearing (so you can still set the appropriate RGB values in transparent areas), then I recommend you use this mode. In your case, painting the transparent areas white would fix your issue (and an alpha channel isn't actually required at all).

#5028333 Alpha Blending: To Pre or Not To Pre

Posted by Hodgman on 03 February 2013 - 08:09 AM

When would you do the pre multiplying?

If when authoring your texture, the transparent/background parts are black, then it's already done ;)

#5028328 C++ O(n) Hashing? Algorithm

Posted by Hodgman on 03 February 2013 - 07:57 AM

You could run a linear-time sorting algorithm over your data, then iterate the results while ignoring any items that were the same as the previous item (to remove duplicates / only count unique items).

#5028049 Dev Kit question

Posted by Hodgman on 02 February 2013 - 02:11 AM

Just be aware that "15 minute playable demo" is alot more work than it sounds.

Yep. I worked on a 45 second non-playable demo for a publisher pitch once, which cost about $110,000...

#5028020 [C#/C++]Multithreading

Posted by Hodgman on 01 February 2013 - 09:45 PM

My statement was meant to be: you better don't use any mt approach to speed up your application, regardless the core count.

And my response was the opposite -- the only reason to use multiple threads is to gain access to extra cores, in order to speed up the application.

Concurrency (as in, interleaving two different tasks) is irrelevant -- use coroutines or fibres or manual time-slicing for that kind of concurrency. Use threads to run code on more physical cores. Ideally, your thread count matches your CPU core count, no matter how many 'concurrent' systems you have.


Ideally, a game running on a single-core CPU would only have 1 thread, and a game running on a quad core would have exactly 4 threads. The game should be able to split its workload amongst the available pool of threads automatically, and when running on the quad-core, it should be almost 4x faster than when running on a single-core. That's the ideal result, and it's not impossble.

But this added complexity to the project and shouldn't be underestimated (for example: you will loose deterministic).

There's no reason that multi-threaded programs have to give up determinism! Multi-threading strategies that introduce indeterminate behaviour are IMHO, bad strategies, in general (they may have niche applications).


One of the first models of computer that you're taught as a student is  input->process->output. You've got some blob of input data, you feed it into some kind of process, and you get some blob of output data. You can then chain sequences of these blocks together in order to create an entire program. At the heart of everything that we do, this model is still relevant.

If you take all the chained IPO blocks that make up one frame of processing in your game, you've got a DAG of processes that need to be run, with dependencies between them (if the input to process #2 is the output of process #1, then process #1 must be complete before running process #2). You can perform a topological sort on this graph to get a linear order of processes, and every process that ends up being sorted to the same 'level' can be run in parallel (across multiple cores) without further synchronisation. This is how many functional programs take any old program and "automatically multi-thread" them, while maintaining perfectly deterministic behaviour.


And again, even (or especially) for games, you choose an mt approach not to make the game performance better. If a game dev thinks "uhm, my performance is to bad, let's switch mt on, I hope it will get better", it's the wrong motivation for mt.

The only reason to launch extra OS threads is because you want to make use of extra CPU cores (or you're forced to by legacy APIs), and the only reason to make use of extra CPU cores is because you need/want more processing power. As above, if you just want simple concurrency -- like background loading, streaming of environments -- you do not need extra threads.

Multi-threading it's not something you can 'switch on' later in the project, it has to be designed into the project from the beginning (when using imperative/procedural/OOP languages, anyway). Typical C++ OOP code, when decomposed into an IPO graph, looks like sphagetti code -- every process has too many side effects, and there's too much mutable state, so every process has multiple outputs all over the place. The DAG that's produced is a complex spider-web, that ends up as a serial sequence of processes with few opportunities to take advantage of multiple cores. Trying to parallelize that kind of code is a nightmare. If you really want that 300% speed boost that you mentioned (which is attainable in games, despite what many say), you need to be writing code that's well designed for a smart multi-threading strategy from the very start of your project.

#5027995 [C#/C++]Multithreading

Posted by Hodgman on 01 February 2013 - 06:58 PM

Multithreading is often misunderstood, even under devs. Multithreading is primary used for parallelism and not to speed things up. For example, in games multithreading is ideal to keep your game responsive while the game is loading some resources (for the next area), the user is doing some inputs, or the AI is calculating (re)actions.
Yes you can achieve speed ups with mt, and mt is often used for speed ups, for example the rendering in suites like 3ds or Maya. But your problem must be suited to be run in a parallel way. And in most cases the speed up is far away from a linear speed up. With a perfect linear speed up you will gain potentially 300% performance with a quad-core, this seems huge. But a linear speed up is unrealistic. You have to organize (Mutex, MVar, synchronize, STM) the different processes or threads on their meeting-points, and that results into a slow down. It's utopian that a whole game problem will gain a 300% speed up, even +100% is far away from reality. In most cases you will solve specific sub-problems with mt or, and that is the most common way, you decoupling sub-systems from each other to be run parallel on their own processing unit.

I couldn't disagree with this more. This may be true for typical GUI-tools, but not games. Games are (soft-) realtime applications meaning you've got to hit a fixed time budget per frame, consistently.


When you're making a GUI-tool, you need the GUI part to remain at an "interactive" level of responsiveness (not real-time), while you do some heavy processing over a long period of time in the background. Threads are a very convenient way to achieve this -- if you put the GUI in one, and the heavy processing in another, then the OS will ensure that each of them obtains some amount of CPU time every so often (by default on Windows: one 15ms time slice at least once every 5 seconds).


Using this same approach in a real-time application is harmful. For example, say that we're on a single-core CPU, and when we load a file into RAM we've then got to run a LZMA decompression step on the loaded data, which takes a total of 1 second. You don't want this to affect the progress of the game's 'main thread' and impact the frame-rate.


Approach 1) We put the decompression code into a separate background thread, which sleeps unless it has work to do. When it does have work to do, we're relying on the OS's thread scheduler to choose which thread is running on the single CPU core. By default on windows, the scheduler granularity is 15ms, so the decompression thread will require 67 time-slices to complete it's 1 second task. If our main thread is attempting to run at fixed real-time frame-rate of 60Hz (a limit pf 16.6ms per frame), then during the time that the decompression thread is awake, this is now impossible (unless your 'main thread' only has 1.6ms of work to do per frame). From time to time (unpredictable), the main thread will be put to sleep for an entire 15ms time-slice (or maybe multiple time-slices).

That kind of unpredictability is simply not acceptable to a real-time application.


Approach 2) We manually time-slice the decompression code, so that after it's run for ~1ms (or some other chosen threshold), it stores it's state and returns/yields -- a.k.a. cooperative multi-tasking. We run the decompression code on the "main thread" every frame, knowing that the biggest interruption that this task can have is a very predictable 1ms per frame.


As swiftcoder mentioned above, many "scripting" languages only provide these kinds of "cooperative multi-tasking threads" (often called Fibers in C++), instead of OS-level threads, and their entire purpose is to allow for concurrency of tasks.

On the other hand, OS-level threads should only be used in order to take advantage of hardware-level threads, which is only useful for gaining extra computational power. Using OS-threads for anything other than gaining access to extra hardware, in a real-time application, is an abuse of them. The exception to this is when interacting with legacy APIs that have long-blocking functions, which force you to put them into a thread.

n.b. file loading and user input aren't in this category -- your OS provides (non-blocking) asynchronous methods for these.


Post-load resource processing, and AI processing can both be time-sliced, but may also be multi-threaded if they're processor intensive.


MT is often a trade-off. MT will make your project much more complex. More complexity will make your project more error-prone and will slow down the whole project progress. Your code-base is more fragile and "uglified". Whats the benefit? More responsiveness, that's fine!. 10%-30% "speed up", maybe not worth it.

That entirely depends on the MT strategy that you choose. Many job-based strategies end up producing code that's simpler than typical C++ OOP code...

#5027737 Anyone here a self-taught graphics programmer?

Posted by Hodgman on 31 January 2013 - 10:29 PM

I have pretty much the same story as L. Spiro. I'd started learning C++ as a teenager, and played around with OpenGL tutorials from time to time. The easiest way that I could make cool games was by modding existing ones, so I learned most of my 3D math by accident, as a result of playing around with other people's game code. When it came to high-school physics class, I realised I already knew about concepts like vectors and forces etc, except sometimes with the wrong terminology.


I then did "IT" at university, but kept playing around with D3D and GL in my spare time, and tried to use it in university programming assignments whenever I could. I did manage to take one elective class which actually taught us basic fixed-function D3D, but I aced that class because I'd already taught myself the subject matter! Unfortunately I wasn't able to take the theoretical computer graphics classes at all.

I just bought books, read online articles, and worked on it by myself until things clicked and fell into place.

Repeat for many years until today.

^^ this... except I couldn't afford to buy books at all until I started working as a programmer wink.png


In one of my early jobs, there was an opening in the engine department, for someone who knew shaders and graphics programming. I put my hand up based on my hobby work, and was more qualified than anyone else that went for it, so I got transferred into that department and got to start doing it professionally. At my next job I applied to be a graphics/special-effects specialist, and got the job (with the title of "junior effects programmer" to begin with dry.png) and I was paired up with a guy who'd been doing it for years, whom I learned a lot from, just by sitting next to him. I also learned a lot by being exposed to the low level APIs on the consoles, and their spec documents. After that, I got a job as the main graphics programmer at a different company in their engine team, and was thrown in the deep end along with the specifications for all the consoles and the responsibility of making a new renderer from scratch biggrin.png

#5027686 Fresnel equation

Posted by Hodgman on 31 January 2013 - 06:09 PM

Phong was made just with intuition and observation, so it's an 'empirical BRDF'.
Blinn took Phong's work and recreated similar results from a theoretical basis this time, instead of from guesswork and collected data. The framework he used was microfacet theory (which says that only the facets oriented towards the half-vector will produce a visible reflection), so it's a 'microfacet BRDF'. So yes, it's the fact that the BRDF only considers the percentage of the surface that is pointing in the H direction, that makes it a 'microfacet BRDF'.

#5027676 Global Event Manager System

Posted by Hodgman on 31 January 2013 - 05:53 PM

Is that your recommended method of managing events, or are you just showing an example of delegates?
Do you find it actually beneficial to use in games, or just desktop applications?

Well I hate global/centralized 'event managers', after having used quite a few very over-complicated ones before, and yes, I much prefer 'plugging components together' like in the example, via delegates/slots/callbacks/functors/what-have-you.
Yes this use useful for games, not just GUIs. Actually, many over-complicated 'entity/component systems' have some kind of complex event routing built into them, when the above example would solve the same problem easier.

That said, these days I do personally like to reduce abstraction in the flow of control in my code, and find that events/callbacks/virtual/etc all obfuscate the flow of control... So, i like to be more explicit about what the 'current operation' is. In the above example, I'd rather have the physics module generate an entire collection of ground-collision-points, and then fire an event, which the sound module uses to then spawn ground-collision audio effects for that entire collection.

#5027673 Questions on graphic design programs for games

Posted by Hodgman on 31 January 2013 - 05:37 PM

For someone to say that you have to use Photoshop to be an artist, yeah I'm knocking that aside.

No one said that.

#5027465 Global Event Manager System

Posted by Hodgman on 30 January 2013 - 10:58 PM

The point of delegates is that the physics and sound systems don't have to know about each other (when contrasting with interfaces, anyway).
The next layer up in the code base, which does know about physics and audio can then glue together those two independent modules.
e.g. In this C# code, when the physics body's HitTheGround method is called, the sound effect's Play method is called, even though physics and sound are completely independent of each other.
class Position { };
//physics module
class RigidBody
	public delegate void OnTouch(Position pos);

	public void AddOnTouch(OnTouch e)

	public void HitTheGround()
		foreach (var e in m_onTouchEvents)

	private List<OnTouch> m_onTouchEvents = new List<OnTouch>();
	private Position m_position = new Position();
//audio module
class SoundEffect
	public void Play(Position where)
//game (can glue the above together)
class RockEntity
	public RockEntity()
	private RigidBody m_body = new RigidBody();
	private SoundEffect m_sound = new SoundEffect();
Which language are you using? I can translate that into C or C++ if you like...

#5027458 Global Event Manager System

Posted by Hodgman on 30 January 2013 - 10:40 PM

So my question is, is there any better way?

Cheeky answer: Don't use the global event manager anti-pattern tongue.png


Would it be possible to just link individual objects that need to communicate together directly, using delegates?

#5027354 [C#/C++]Multithreading

Posted by Hodgman on 30 January 2013 - 04:46 PM

It's extremely hard to take an existing, large, code-base and try and shoe-horn in parallelism. You need to have a multi-core processing strategy in mind from the very start of the project in order to be effective.

#5027139 sizeof() not working ?!?

Posted by Hodgman on 30 January 2013 - 04:22 AM

Even better, use metadata to automatically generate the format and the serialise function (sadly we don't do that where I work).

Could you give an example of that? (what kind of meta data, where does it come from?)
At one job, we parsed any '.h' files who's parent directory fit a naming convention. These headers just contained C struct declarations, which the parser would convert into a table of meta-data, such as field names, types, offsets, etc. You could use specially formatted comments to specify default values, valid ranges, descriptions, desired UI elements (e.g. Color picker), etc.
From that meta-data, we could then automatically generate text-to-binary serialization functions, so that all game data could be stored in a common, simple text format, but also compiled into a runtime-efficient format without effort.

I've also heard of other companies doing similar things, but via a custom language, and they also produce a C '.h' file as output for the engine to use.