Also, what would happen when a job asks for a thread and none is available? Or if a job with a higher priority is constantly using the thread making the jobs with lower priority not having always a chance of getting their own?
Just curious. Merry Christmas!
A New Threading System for Game Engines...Your Thoughts?
Quote:
These locks can lock an object for an extended period of time which causes other threads to wait for the object to be free.
It also means that you have to be careful to keep locks for as short a time as possible. It is possible to starve a thread because one holds the lock for too long, while the other holds it only for a moment.
while(1) while(1){ { ..stuff.. lock() lock() ..stuff.. ..single thing.. ..more stuff.. unlock() unlock() ..stuff.. Sleep(1)} }
If thread A doesn't get time during the short moment when thread B is sleeping, then thread A can starve for a long time.
Quote:
Also, what would happen when a job asks for a thread and none is available?
Actually, you'd go the other way around. All the threads look at the queue of jobs that are available, and through whatever priority system you have, eat up the next job. As long as you do your sync points right, you should never end up with jobs that never get time. Since everything will eventually require something else to finish before it submits more jobs.
This isn't a random assortment of tasks, but a game. Design the system to take into account the usage. A background task like streaming in level data, or a networking task, or whatever asynchronous task might just need a thread of its own, outside the pool. The thread pool is good for scheduling lots of little assorted jobs that happen to lots of items in parallel. Like dealing out 200 entity updates over 4 threads, or 200 animation blends, or 2000 culling tests. You'd use it for anything where you have some trivially parallelizable data that you need split across multiple cores. You can make it work on larger jobs as well, but the overall idea is that the sources and sinks for jobs are well known in advance, just the number of jobs and the time each job takes can vary. But, each job has a defined start and end point, so it can't 'constantly' use up one thread.
My, I'm sorry it took me so long to reply. Thanks for the input.
Putting the input routines in with my window procedure would not work, in my case, because I am not guaranteed to have one. I tried to stay as cross-platform as possible here. On a Windows only solution, this would be optimal.
However, there is are two problems with the design of thread pools that seems to have been overlooked. With this design, if I understand it correctly, you have one thread per processor, so there is no thread switching on a core. However, there are other programs running during the game session. Some players play music, have antivirus, even use an Operating System while running the game (Imagine that! ;). All these programs run on (at least) one thread, so there is still thread switching going on. Granted, there are fewer threads competing on the same core, but this cannot eliminate the problem entirely.
Perhaps the more pressing problem is that processor affinity is not guaranteed to be acknowledged. Similarly to how a compiler does not have to make C++ function declared with the "inline" keyword actually inline, the OS does not have to (and may very well not) place your threads on the processor you ask. If it finds that one processor is being used heavily, while another is not used at all, it may place two of your threads in the thread pool on the same core. Depending on what jobs are on each of these, this could severely slow down the frame.
Putting the input routines in with my window procedure would not work, in my case, because I am not guaranteed to have one. I tried to stay as cross-platform as possible here. On a Windows only solution, this would be optimal.
However, there is are two problems with the design of thread pools that seems to have been overlooked. With this design, if I understand it correctly, you have one thread per processor, so there is no thread switching on a core. However, there are other programs running during the game session. Some players play music, have antivirus, even use an Operating System while running the game (Imagine that! ;). All these programs run on (at least) one thread, so there is still thread switching going on. Granted, there are fewer threads competing on the same core, but this cannot eliminate the problem entirely.
Perhaps the more pressing problem is that processor affinity is not guaranteed to be acknowledged. Similarly to how a compiler does not have to make C++ function declared with the "inline" keyword actually inline, the OS does not have to (and may very well not) place your threads on the processor you ask. If it finds that one processor is being used heavily, while another is not used at all, it may place two of your threads in the thread pool on the same core. Depending on what jobs are on each of these, this could severely slow down the frame.
Quote:
However, there is are two problems with the design of thread pools that seems to have been overlooked. With this design, if I understand it correctly, you have one thread per processor, so there is no thread switching on a core. However, there are other programs running during the game session. Some players play music, have antivirus, even use an Operating System while running the game (Imagine that! ;). All these programs run on (at least) one thread, so there is still thread switching going on. Granted, there are fewer threads competing on the same core, but this cannot eliminate the problem entirely.
That is a problem no matter how you look at it. The idea of thread pools isn't to eliminate that issue. It is to provide the OS with enough CPU bound threads to fill up all the compute time on all the cores. If a core supports 2 or 4 hardware threads, then you'd put 2 or 4 threads on that core. (hyperthreading for instance increases hardware threads per core). You just have to hope that you have enough variation in your thread's processes that it can find instructions to interleave between your threads. AND that you aren't memory bandwidth bound.
In the end, you end up with enough threads doing enough things to keep all the cores busy. As opposed to a non-scalable setup where you only have threads per task(AI, Physics, Networking), and you game requires Exactly that many cores to run optimally, any less and you waste time swapping threads, and any additional cores are not used. Now, this might not be a problem on fixed hardware. If you know you are on the XBox with 3 xeons, 6hw threads. Then you can build specifically for that. But if you want something that runs good on a single core, dual core, quad core, And the soon to be 8 or 16 core machines. Then you need to start with a scalable design.
Quote:
Perhaps the more pressing problem is that processor affinity is not guaranteed to be acknowledged. Similarly to how a compiler does not have to make C++ function declared with the "inline" keyword actually inline, the OS does not have to (and may very well not) place your threads on the processor you ask. If it finds that one processor is being used heavily, while another is not used at all, it may place two of your threads in the thread pool on the same core. Depending on what jobs are on each of these, this could severely slow down the frame.
That is good and bad. Hopefully the OS is smart about the scheduling. Sometimes it isn't. For instance, there is a "core maximizer" tool for Supreme Commander, all it does is sit in the background and shuffle the threads around a little, because sometimes after alt-tabbing, the OS will decided to pair the wrong set of threads together.
But letting the OS swap the threads around at will is often better than tieing down the threads to one particular core. Expecially since the OS abastracts the cores away from you. You don't know what cores exist where, but the OS does, and hopefully makes decisions based on that.
For instance, the origional quad core Intel chips were 2 dual cores stuck together. While the quad core from AMD was 4 cores stuck together. A tiny difference, but affects alot due to how the cache for each setup is different. Same thing happens when you have multiple processors. Hopefully the OS can arrange the threads based on the memory segments that the threads will access, to minimize the processors having to talk to one another. It can't always, and you'd have to help it out. But then you have to know the arrangement of cores in the machine.
Quote:Original post by phantom
Locks can be avoided if you setup your architecture in certain ways.
I don't claim this is the best method, I've not done huge tests on it, however I'm a fan of doing two passes over the objects.
The first pass updates an internal state which can read from other objects external states. The second is a 'sync' step where objects copy their internal state to an externally visable state for others to use on the next pass.
The downside of this is that it requires two passes over the components thus costs you memory bandwidth; however it does remove stalling of threads due to locks and other potential memory hazards along the way.
This cunning plan has been used by graphics renderers since time immemorial. Of course, the 'sync' step is flipping the buffers, so what used to be the 'internal' state is now the external state, and you regenerate the new internal state entirely each time based on buffered data elsewhere. But the principle is the same - you can't afford to have the monitor and the program competing over the same lump of video memory so you give the monitor an old copy to play with while you tinker with the new copy.
It's interesting that what has become common knowledge for graphics doesn't immediately occur to game programmers for the more general case!
Quake 3 did something similar to what's posted above. They would dump everything into a "front buffer" which ended up being a triangle soup, shaders and a very basic set of commands while the backend thread was processing the "front buffer" of commands and data. When that frame signalled completion the command buffers and associated structures were swapped and the cycle began all over again. Doom 3 apparently took this one step further but we need to wait for the public source to review that :).
I need to find an article back on how one company broke up their engine into tasks, jobs, etc. since it was an interesting read, but I can't remember the name of the game right now to save my life.
EDIT:
It was Lost Planet, I'll go hunting for the presentation. I think it was at the GDC.
I need to find an article back on how one company broke up their engine into tasks, jobs, etc. since it was an interesting read, but I can't remember the name of the game right now to save my life.
EDIT:
It was Lost Planet, I'll go hunting for the presentation. I think it was at the GDC.
Here it is, I found this to be a really interesting read :)
Excellent Overview of Presentation
Literal Translation with Slides
Literal Translation Part 1
Literal Translation Part 2
Excellent Overview of Presentation
Literal Translation with Slides
Literal Translation Part 1
Literal Translation Part 2
"..It's interesting that what has become common knowledge for graphics doesn't immediately occur to game programmers for the more general case!"
While it is true, I believe you have to consider the purpose of a GPU versus a CPU. Programming on a GPU in parallel is thought of 'natively' because a GPU is a very specialized processor that only performs a limited set of specific functions on each 'bit' or pixel of data. Since each function is almost guaranteed to be independent, parallel programming on a GPU becomes much easier.
In contrast, programming on a CPU is much more 'complex', meaning that it is not dedicated to a select set of functions. Since a CPU is far more 'generalized', the coder needs to explicitly establish thread locking / unlocking / scheduling of resources within the CPU. This inherently makes parallel programming on a CPU much more difficult.
While it is true, I believe you have to consider the purpose of a GPU versus a CPU. Programming on a GPU in parallel is thought of 'natively' because a GPU is a very specialized processor that only performs a limited set of specific functions on each 'bit' or pixel of data. Since each function is almost guaranteed to be independent, parallel programming on a GPU becomes much easier.
In contrast, programming on a CPU is much more 'complex', meaning that it is not dedicated to a select set of functions. Since a CPU is far more 'generalized', the coder needs to explicitly establish thread locking / unlocking / scheduling of resources within the CPU. This inherently makes parallel programming on a CPU much more difficult.
With regards to the question raised earlier about locking the scene graph - you only need to lock the nodes that are being modified, not the whole thing. So if 4 enemies are updating their visual information in the scene graph, they only need to lock their own nodes, not any other info.
Anytime the rendering thread encounters a locked node when traversing the scene graph, rather than waiting for that node to be released, it can keep an internal list of all nodes it enountered that were locked and revisit them after traversing the rest of the tree (your mutex class would need to support asynchronous/try locks i.e. try to lock it, and return a boolean flag on whether or not it was successful).
Design your locking policy (and the placement of the mutexes themselves) to reduce as much as possible the possibibility of two threads trying to aquire the same mutex (and for those instances where they will, see if there's anyway of having the blocked thread do other work until the mutex is freed up).
Anytime the rendering thread encounters a locked node when traversing the scene graph, rather than waiting for that node to be released, it can keep an internal list of all nodes it enountered that were locked and revisit them after traversing the rest of the tree (your mutex class would need to support asynchronous/try locks i.e. try to lock it, and return a boolean flag on whether or not it was successful).
Design your locking policy (and the placement of the mutexes themselves) to reduce as much as possible the possibibility of two threads trying to aquire the same mutex (and for those instances where they will, see if there's anyway of having the blocked thread do other work until the mutex is freed up).
Quote:Original post by astralvoid
"..It's interesting that what has become common knowledge for graphics doesn't immediately occur to game programmers for the more general case!"
While it is true, I believe you have to consider the purpose of a GPU versus a CPU. Programming on a GPU in parallel is thought of 'natively' because a GPU is a very specialized processor that only performs a limited set of specific functions on each 'bit' or pixel of data.
But I am talking about a trivial operation (double buffering) that we were using for games long before they did anything for us beyond just showing a frame buffer to the screen. It's nothing to do with GPU operations as such. It's just to do with how to efficiently implement mutual exclusion between the monitor and the game, ie. by having 2 sets of data.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement