To control threads?

Started by
9 comments, last by Antheus 15 years, 3 months ago
Let me start off by saying I am in no way any authority in the industry as I just graduated college and am feeling my way into engine development. I am solely looking to expand my knowledge and possibly help other curious folks. As the hardware industry moves into their newest paradigm of separating cores and adding independent processors so must the software industry adapt to the new technologies. When processors were just getting faster and faster, the easy way to keep games running smoothly was time based programming instead of the frame based ideas that people used to implement. Now the trend is not faster processors but to add more of them, soon I believe we will be witnessing a race of cores between the processor manufacturers. The question is: “How do we design our programs to take advantage of 8, 16, 32, 64, 128+ core processors of the near future?” The answer is: Threading, which allows for parallelism that takes advantage of more cores. I am designing my third game engine from scratch and need to make a critical decision. And the answer is based on the answer to this question: Does running, lets say twenty small threads on one processor core run faster, or better in anyway than evenly distributing the load among five threads on one core? I wrote a tech demo for a dynamically threaded engine to experiment and learn by doing. I know that context switches can be costly and that they happen every time the OS switches threads. Do you (the community) think that it is beneficial to continue perusing a thread manager that spreads the load evenly every so often? Or should I just program everything as a single thread as much as possible in order to skip the overhead of the managing system? If I am missing some important issue or am way off base please tell me. ^^;
Advertisement
The more threads per core the lower the performance. Context switches are very expensive. Generally, the approach to systems like PS3 or 360 is to aim for 1 thread per core, with potentially a couple running on the main core (a main thread and a worker thread).

Another challenge with threaded systems is lock/resource contention. Putting 2 objects on separate threads won't necessarily gains you anything performance-wise if they require read/write access to the same memory resource to update successfully. For instance an AI entity will need to know where the player is, but if the player is updating on a separate thread you may have contention over accessing the player's position from the AI entity's thread because the player has that memory locked for update.

Yet another is that in games, there are tons of serially dependent update sequences. For the AI to update it may need to know the results of the physics update or the player input update. So if you have AI and physics in separate threads then you run the risk of updating the AI with last frame's physics results for some objects, and this frame's physics results for others. So depending on the needs of various systems you cannot necessarily parralelize those interdependent sequences.

A common solution to many-core development is a job-based system where you have n worker threads updating a load-balanced set of discreet jobs. This will maximize core usage, but because of the above challenges presents a very difficult problem to solve if you want to apply this to games. To my knowledge no one has found a great solution to this for games, but current attempts generally involve double-buffered shared memory spaces (only updater writes, anyone can read) and a tolerance of operating on out of date information. You will also need to implement a set of barriers so that threads across all relevant cores wait to proceed to step 2 until all threads have completed step 1 (basically managing serial dependencies while trying to do things in parallel that can do so).

It's a giant wasps' nest of hassle. Enjoy the pursuit, it's a fun problem to try and crack.

-me
I think I will go ahead and create my thread manager system. At worse performance I assume it would run as fast as a single threaded system right?

I mean, because if every thread is waiting on the currently running thread it only can slow down so much if created and managed correctly.

My manager would initialize based on how many cores the computer is running and swap componants to different threads based on thread performance to keep it as efficiant as possible at any given time.

You are right though, a truely advanced physics system would have to rely on AI and visa~versa. The best performance boosts would be with rendering and things that only need to see values like HUD systems.

I think we need a revolutionary memory design to take full advantage of multi core systems. Something along the lines of allowing multiple cores to access and change the data at the same time or something... but anyway, thank you for the responce.

-TonyG
I am not exactly sure what you are getting at. For simple job dispatching systems you can use a threadpool system that will automatically run the thread on an unused or underused core. Otherwise if you are creating long running threads, you generally want to dedicate them to a core and not swap them around.
maybe I am going about this wrong but my manager takes in account three types of entitys:

Componant obj (contains stuff for the componant like rendering stuff)
Thread obj (contains vector of componant objs and a system thread)
Core

A componant can run on any thread, with multiple componants on any given thread.
Thread objects are created based on number of cores at startup.

So lets say I have 2 cores therefore 2 threads and I have also 4 componants that are completely independant of each other. The componants would have a value based on how long it takes to run their updates.

the idea would be to evenly distrubute the load like:
thread 1 (comp1: 40%, comp2: 10%)
thread 2 (comp3: 30%, comp4: 20%)

My question was.. would this be faster than just making 4 threads and let the OS take care of it because that would increase context switching but reduce my thread management cost.
If you are going to have more threads than cores, it might be in your best interest to let the OS handle it, however, the best practice would be to only create as many threads as cores. Your method only introduces extra overhead.

However, if your threads are following the pattern work->wait->work->wait->work, you might want to reconsider your design and use background workers or some other threadpool implementation. Otherwise, you are simply going to waste time as you are not guaranteed to get any performance by offloading the thread just to collide with the work portion of another thread.

Ideally, if you are going to create a full thread that runs indefinitely, you want it to be something that is going to keep the core busy. As I said, if you are just queing up little jobs, you should look into a threadpool/background worker implementation.

Just looked up thread pool. That seems to be what im doing with the exception that I wasnt designing for "tasks" as much as full componants. That might work alot better but will require a redesign of my componants. Thanks for the replies
Quote:Original post by agaudreau
Just looked up thread pool. That seems to be what im doing with the exception that I wasnt designing for "tasks" as much as full componants. That might work alot better but will require a redesign of my componants. Thanks for the replies


If you can break the components into tasks (and this can be often done although it would not seem like it at first), that will allow you to take advantage of the OS's scheduling to reduce collisions and context switching. I am not sure about other implementations, but the .NET threadpool will keep contexts open so a task can slip right in and do its work, without the overhead of creating a new thread context everytime you need work done.
Quote:Original post by agaudreau
You are right though, a truely advanced physics system would have to rely on AI and visa~versa. The best performance boosts would be with rendering and things that only need to see values like HUD systems.


Actually, rendering is the one place where you get no benefit from multiple threads. The GPU is already maximally paralelized and only a single thread can access the device at a time.

Quote:Original post by agaudreau
I think we need a revolutionary memory design to take full advantage of multi core systems. Something along the lines of allowing multiple cores to access and change the data at the same time or something... but anyway, thank you for the responce.


No, we already have that. That's how shared memory multi-processing works. All threads can concurrently access the memory and that's the problem algorithmically. If two threads are trying to write data to the same class or set of variables they will do it and you'll end up with garbage (some of thread A's bits, some of thread B's. What you aim for with multi-threading is to allow simultaneous reading of memory but prevent simultaneous writing and prevent writing while also reading. And you need to do that while cutting down on lock contention and compare-and-swap operations which get expensive.

But otherwise, what I'd suggest to you after following this thread is for you to take a month or more to research already existing solutions to multi-threaded problems. This isn't even a remotely new field. People have been writing distributed apps for decades, just not games. So study up on the body of algorithms that exist and figure out ways to apply them to your game.

-me
Yeah I didn't mean have more than one thread rendering, I was saying the over all system gains from it because one thread can be rendering while another updates sounds or something.

I realize I am not the first to look at this and I also understand that it isnt as easy as it looks at first. I was just wondering about my manager idea.

And also, I meant a new memory system that would bypass the issues when writing and reading at the same time. just more of a musing that anything serious.

This topic is closed to new replies.

Advertisement