• Advertisement
Sign in to follow this  

Multithreaded engines

This topic is 768 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys,

I have been studying C++ lately, especially the std libraries related to concurrency and threads. I came to know about atomics and lock free programmimg in general. Given that game engines are some of the complex softwares requiring high performance, are there any game engines in market which use multithreading. Or lock free algorithms? (Lock free algorithms are pretty low level even compared to traditional locks, mutexes, semaphores).

Also we have vulkan which absolutely forces us to deal with memory on GPU. It seems we are going deeper and deeper into hardware.

Share this post


Link to post
Share on other sites
Advertisement

Hodges input is pretty much golden. And here is something from personal experience with threading.

There are always ways to thread any system in your engine. But it does not necessairly mean that you should. There are occasions when threading is less performant than doing it single threaded due to the needs of synchronization. But that could just very well be a matter of doing it right... which is hardly ever a realistic insight. You can't do everything right.


You can also find ways to mask the nature of threading from the game play programmers. I tried doing something similar to what Killzone did, which was to turn the update of each Actor into a work job, and schedule all depending jobs in such a fashion. The problem was... this got complicated real fast. So I quickly threw the sucker out.

In the end.... there is always one advice that is solid. Plan for multi-threading ahead of time, than as an after thought when you need it. Implementing it late game, where your systems are already developed can easily be a pain, because it means you need to reduce coupling, which means Frankenstein a good portion of your code, and suffering hours of edits.

Share this post


Link to post
Share on other sites

and it usually makes the code considerably easier to maintain if for no other reason than side effects have to be removed.  But you have to be explicit and as such you need to break your logic into a series of pieces much more like a GPU set of shaders instead of linear code.  It is very similar to the other forum thread about data oriented versus object oriented design

Yeah this is pretty much the key.
In "typical OOP" systems you often have no idea what the flow of control is (e.g. loop over a collection of interfaces, call a virtual, which calls a virtual, which calls a virtual), and if you don't know what the flow of control is, then you can't know what the data access patterns are, which means you can't safely schedule anything. Graduate programmers try to fix this mess by introducing locks to the code to make it "thread safe", which is not seeing the forest for the trees. An alternative solution is to rearchitect the flow of control so that things don't have to be "thread safe" -- if you know that only one thread is writing to an object at a time, no locks are required.

Instead of typical OOP bullshit, you should try to write systems that are fairly contained when it comes to both flow and data access, and which are very explicit in what all of their inputs and outputs are (and thus explicit in what their side effects are). It then becomes extremely simple to figure out how you can run these systems on a parallel machine.

 
To illustrate an extreme - on the PS3's SPU cores, you can't directly access any RAM. Instead, you have to basically memcpy a block of RAM to the SPU, do some calculations, and then memcpy the results back to RAM.
We supported this architecture by forcing our jobs to pre-declare a list of pointer+size pairs (e.g. struct Range { void* where; size_t size; bool writable; }; vector<Range> inputsAndOutputs;). The job system would then copy those regions of RAM to the SPU before the job ran, and after the job had finished it copied the writable ranges back to RAM. If you tried to access a RAM range that you hadn't declared first, your code wouldn't work.
At first, this seemed like a massive pain in the ass. Why would anyone want this!?
But... it actually forces a great deal of discipline onto the team, where people start thinking about what the minimal data requirements for their update code is, how the data can be arranged to support parallelism, and how data-dependency chains can be simplified.

It also gives you some neat debugging tools on the PC -- you can emulate a SPU on the PC by doing those unnecessary memcpy's into a temporary working region before running a job, and memcpying the results back into place. If the job is written properly, this shouldn't affect the results (if it does, someone's got an accidental/undeclared data dependency).
I think that game programmers the world over should thank the PS3 for forcing us to become better software engineers smile.png

 

And as All8 said, this should actually be very familiar to any graphics programmer. On the GPU, we've usually been restricted from accessing RAM whenever we want to -- instead, we've been forced to explicitly declare all of the input/output resources that are required. Even though we usually feed the GPU a serial stream of draw/compute commands, it manages to be one of the most parallel/multi-threaded parts of any game :wink:

 

Share this post


Link to post
Share on other sites

1) Make sure that the task is complex enough to be worth the effort.
2) Make sure that data sharing in the task is low enough that synchronization won't cancel out any performance gains
3) Atomics are very useful sometimes. Use them to avoid having a syscall on a mutex when you know the protected code will execute faster than the syscall. Use them to implement complex locking structures where system locks aren't suitable. Don't use them to avoid acquiring a mutex to protect data for a complex task. And question why you're using a mutex for a complex task, anyway.
4) Threads are awesome, locks are not. Take full advantage of the former, avoid having to use the latter.

Edited by nfries88

Share this post


Link to post
Share on other sites

Nice thread. I'll write a small multicore techdemo namned Sussex over the following weeks so this post arrived at the perfect time for me :).
 

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement