Sign in to follow this  
Dynamo_Maestro

[C#/C++]Multithreading

Recommended Posts

For those of you who havent heard, EVE Online encountered a massive 2800+ player fight the other day, which resulted in a less than perfect battle, I wont bother going in to details about the fight or game, however it generated a lot of complaints, the main one being "the game only uses one core, why not all?".

 

The game is built using stackless python and C++

 

Now a spokes person for CCP responded with "...there's no simple way to make something multithreaded..." among other things.

 

Ok so my question is (baring in mind I do ALL my multithreading work in C#), how accurate is this statement? I am not trying to cause a flame war or anything I just find this comment 'unusual' and feel it is invalid but due to limited experience in C++ it would be wrong of me to even assume this, help me understand what he means.

 

The link is here: https://forums.eveonline.com/default.aspx?g=posts&m=2541374#post2541374

 

Thanks in advance

 

PS: The reason I havent asked CCP myself is simply because they are terrible at replying and put very little effort in responding

Share this post


Link to post
Share on other sites

the main one being "the game only uses one core, why not all?"

 

I was under the impression that the main issue was server load, not client?

 

"...there's no simple way to make something multithreaded..."  ...  how accurate is this statement?

 

I would say it's pretty accurate. You can't really just flick a switch and turn on multi-threading. While some optimising compilers can parallelise some loops if they can determine there are no side effects, in general you have to actually write multi-threaded code. 

 

That's not too bad if you're starting from scratch. While multi-threaded code still has it's gotchas, more widespread use of the last few years has brought about some patterns and principles that ease the burden.

 

But refactoring an existing code base to be multi-threaded? That's very rarely easy. 

Share this post


Link to post
Share on other sites
Writing multithreaded code is easy, writing multithreaded game code that actually performs better than its singlethreaded counterpart is a bit harder and modifying a huge 10 year old serial codebase to execute well in parallell ... well ... thats borderline insanity.

Efficient parallell code is extremely different from efficient serial code, since EvE is a fairly old game there might be some low hanging fruit to pick but to get the big performance increases it might be cheaper to just start over from scratch with a new client engine.

Share this post


Link to post
Share on other sites
I don't know what kind of multithreading you did in C#, but from my experience it's not easier than in C++. So if you say, that multithreading in C# for all your needs is easy than either a) you're a freaking genius or b) your problems were perfectly fitted for multithreading.

Our brain is just designed in a way that it thinks sequential. Once multithreading comes into play, it's hard to even imagine what might happen and to think of all possible scenarios. Andrei Alexandrescu from the C++ consorptium pretty much naiiled it: "Multithreading is just one damn thing after, before, or simultaneous with another". And that's all we know.

When the code based is not designed for parallelism from the start it is a great step to adapt.

The funny thing is, that most of those people complaining about only one core being used would shut up, if you'd just create <number of cores>-1 threads in your program doing nothing but an infinite loop so they see 100% of CPU usage.

What I am trying to say is, that saying "...there's no simple way to make something multithreaded..." is very accurate. Usually, the first 2 or 3 attempts to parallelize a previously sequential algorithm/architecure will lead in full CPU usage but in slower execution. Edited by brx

Share this post


Link to post
Share on other sites

I don't know what kind of multithreading you did in C#, but from my experience it's not easier than in C++. So if you say, that multithreading in C# for all your needs is easy than either a) you're a freaking genius or b) your problems were perfectly fitted for multithreading.

 

Just to clear things up a bit, what I meant by my comment was I am only familiar with multithreading in C#, and since his comment was referring to C++ I was wondering what he meant. Oh and I should mention I wasnt a poster in that thread, and only became aware of it when it was moved. I dont actually 'play' but I do make use of their API so it made very little difference to me, this was simply curiosity more than anything :)

 

Anyway thanks everyone for answering, I sometimes word things wrongly but this truly was a "if in doubt ask" moment

Share this post


Link to post
Share on other sites

Also worth noting that a bunch of Eve's server-side code is written in Stackless Python, and Python in general is a nightmare to multi-thread.

 

(there is a little doohickey called the Global Interpreter Lock, which throws a great big wrench in the works)

Share this post


Link to post
Share on other sites

Yeah, that thing is a pain... our old build system was based on Python which basically meant all build 'setup' was single threaded (which involved working out a dependency graph; quick on small asset counts but as the assets increased so a comedy wait time was introduced before it started building) and while the external tools were run outside python because of how it was designed you had to spin up 100s of threads in order to launch and wait for them to finish... (GIL is released when waiting on an external process.)

 

Fortunately this became a big enough problem that a C#/.Net re-write was allowed \o/

Share this post


Link to post
Share on other sites

Also worth noting that a bunch of Eve's server-side code is written in Stackless Python, and Python in general is a nightmare to multi-thread.

 

(there is a little doohickey called the Global Interpreter Lock, which throws a great big wrench in the works)

Good to know. 

Although I've never used stackless, I was under the (obviously mistaken) impression that easier multi-threading was one of the benefits.

 

Guess I was wrong!

Share this post


Link to post
Share on other sites

Although I've never used stackless, I was under the (obviously mistaken) impression that easier multi-threading was one of the benefits.

Cooperative multi-tasking, yes. Threading no.

 

Cooperative multitasking-based languages like Stackless Python, Erlang, Scala, and Google's Go, support a very different model of concurrency to the C/C++/C#/Java threading model - it's worth reading up on if you are interested.

Share this post


Link to post
Share on other sites

Multithreading is often misunderstood, even under devs. Multithreading is primary used for parallelism and not to speed things up. For example, in games multithreading is ideal to keep your game responsive while the game is loading some resources (for the next area), the user is doing some inputs, or the AI is calculating (re)actions.

 

Yes you can achieve speed ups with mt, and mt is often used for speed ups, for example the rendering in suites like 3ds or Maya. But your problem must be suited to be run in a parallel way. And in most cases the speed up is far away from a linear speed up. With a perfect linear speed up you will gain potentially 300% performance with a quad-core, this seems huge. But a linear speed up is unrealistic. You have to organize (Mutex, MVar, synchronize, STM) the different processes or threads on their meeting-points, and that results into a slow down. It's utopian that a whole game problem will gain a 300% speed up, even +100% is far away from reality. In most cases you will solve specific sub-problems with mt or, and that is the most common way, you decoupling sub-systems from each other to be run parallel on their own processing unit.

 

MT is often a trade-off. MT will make your project much more complex. More complexity will make your project more error-prone and will slow down the whole project progress. Your code-base is more fragile and "uglified". Whats the benefit? More responsiveness, that's fine!. 10%-30% "speed up", maybe not worth it.

 

I highly guess the EVE Online client makes use of parallelism, but not in that way the gamer expect. The gamer takes a look at the task manager and complains about the single cpu usage. But in which situation? Maybe this situation is not really parallelizable. For example: Pumping Data to the graphic card is not well parallelizable, sometime not even possible. Let me speculate: EVE Online parallelize the client view, the network and resource loading. In the huge fleet fight, when everything is loaded on the client-side, the bottleneck will be the rendering. And when the cpu part of the rendering isn't well parallelizable, the cpu usage is reduced to the only rendering core.

Share this post


Link to post
Share on other sites

Approach 1) We put the decompression code into a separate background thread, which sleeps unless it has work to do. When it does have work to do, we're relying on the OS's thread scheduler to choose which thread is running on the single CPU core. By default on windows, the scheduler granularity is 15ms, so the decompression thread will require 67 time-slices to complete it's 1 second task. If our main thread is attempting to run at fixed real-time frame-rate of 60Hz, then during the time that the decompression thread is awake, this is now impossible. From time to time (unpredictable), the main thread will be put to sleep for an entire 15ms time-slice (or maybe multiple time-slices).
That kind of unpredictability is simply not acceptable to a real-time application.
 
Approach 2) We manually time-slice the decompression code, so that after it's run for ~1ms (or some other chosen threshold), it stores it's state and returns/yields -- a.k.a. cooperative multi-tasking. We run the decompression code on the "main thread" every frame, knowing that the biggest interruption that this task can have is a very predictable 1ms per frame.

 

I guess I expressed me wrong.

 

I've tried to outline this dilemma and misunderstanding. My statement was meant to be: you better don't use any mt approach to speed up your application, regardless the core count. You use mt to run things at the same time (for games in the same frame). That's independently which high/low level approach you choose. I completely agree with you, approach 1 is the worst case for a single core and approach 2 is more predictable, yes. But these approaches differ "only" in detail of the level (which is not unimportant and will have a deep impact, indeed). You showed that it's sometimes better for the application to manage its (time) resources on its own. But this added complexity to the project and shouldn't be underestimated (for example: you will loose deterministic).

 

And again, even (or especially) for games, you choose an mt approach not to make the game performance better. If a game dev thinks "uhm, my performance is to bad, let's switch mt on, I hope it will get better", it's the wrong motivation for mt. The best motivation to use any low or high level mt approach is, to let happen things parallel. For example: seamless environment streaming. In fact, if you choose approach 2 (aka high level mt), you will loose performance, if you measure your performance in fps-count, which is not a good performance meter and an other topic.

Share this post


Link to post
Share on other sites

My statement was meant to be: you better don't use any mt approach to speed up your application, regardless the core count.

And my response was the opposite -- the only reason to use multiple threads is to gain access to extra cores, in order to speed up the application.

Concurrency (as in, interleaving two different tasks) is irrelevant -- use coroutines or fibres or manual time-slicing for that kind of concurrency. Use threads to run code on more physical cores. Ideally, your thread count matches your CPU core count, no matter how many 'concurrent' systems you have.

 

Ideally, a game running on a single-core CPU would only have 1 thread, and a game running on a quad core would have exactly 4 threads. The game should be able to split its workload amongst the available pool of threads automatically, and when running on the quad-core, it should be almost 4x faster than when running on a single-core. That's the ideal result, and it's not impossble.

But this added complexity to the project and shouldn't be underestimated (for example: you will loose deterministic).

There's no reason that multi-threaded programs have to give up determinism! Multi-threading strategies that introduce indeterminate behaviour are IMHO, bad strategies, in general (they may have niche applications).

 

One of the first models of computer that you're taught as a student is  input->process->output. You've got some blob of input data, you feed it into some kind of process, and you get some blob of output data. You can then chain sequences of these blocks together in order to create an entire program. At the heart of everything that we do, this model is still relevant.

If you take all the chained IPO blocks that make up one frame of processing in your game, you've got a DAG of processes that need to be run, with dependencies between them (if the input to process #2 is the output of process #1, then process #1 must be complete before running process #2). You can perform a topological sort on this graph to get a linear order of processes, and every process that ends up being sorted to the same 'level' can be run in parallel (across multiple cores) without further synchronisation. This is how many functional programs take any old program and "automatically multi-thread" them, while maintaining perfectly deterministic behaviour.

 

And again, even (or especially) for games, you choose an mt approach not to make the game performance better. If a game dev thinks "uhm, my performance is to bad, let's switch mt on, I hope it will get better", it's the wrong motivation for mt.

The only reason to launch extra OS threads is because you want to make use of extra CPU cores (or you're forced to by legacy APIs), and the only reason to make use of extra CPU cores is because you need/want more processing power. As above, if you just want simple concurrency -- like background loading, streaming of environments -- you do not need extra threads.

Multi-threading it's not something you can 'switch on' later in the project, it has to be designed into the project from the beginning (when using imperative/procedural/OOP languages, anyway). Typical C++ OOP code, when decomposed into an IPO graph, looks like sphagetti code -- every process has too many side effects, and there's too much mutable state, so every process has multiple outputs all over the place. The DAG that's produced is a complex spider-web, that ends up as a serial sequence of processes with few opportunities to take advantage of multiple cores. Trying to parallelize that kind of code is a nightmare. If you really want that 300% speed boost that you mentioned (which is attainable in games, despite what many say), you need to be writing code that's well designed for a smart multi-threading strategy from the very start of your project.

Edited by Hodgman

Share this post


Link to post
Share on other sites

If you really want that 300% speed boost that you mentioned (which is attainable in games, despite what many say), you need to be writing code that's well designed for a smart multi-threading strategy from the very start of your project.

 

Amusingly this code tends to end up looking more functional than anything else; a few years ago I read a book on Haskell and while the syntax hasn't stuck (because I don't use it) the way of writing code did and it made me better at writing threaded code.

 

The multi-threaded parts of our engine at work are very much functional in that a bunch of state goes in, is used, and a single output is produced in a buffer; this means we can scale up as far as we want or indeed scale down to a single thread for debugging.

 

(As we have a 'chain' of jobs we do give up some determinism with this system, mostly by allowing different processing segments of the graph run at different speeds; although these tend to be short chains of work with sync points introduced to ensure logical blocks of work are completed before moving on.)

Share this post


Link to post
Share on other sites

But to the OP's original post:

 

1.  Multithreading is HARD to use correctly and get any benifit out of, except where a program has multiple, largely independent problems to solve.  Some examples where multithreading can be used easily on multicore machines to get extra performance:  

 

You CAN run your networking or resources loading or almost anything IO bound on a different core than your main logic, so your main logic keeps "running" until the IO is finished, and then your background thread notifies your main thread of the completed IO.  You CAN'T get much benefit trying to split networking itself up to 4 different cores ... they are all bound by the same reasource.

 

You CAN run 3 different AI algorithms on 3 different cores, IF they read from a read-only set of data, that is small enough that sharing it out to the separate cores is significantly faster than the algorithm itself.  You CAN'T get much benefit trying to split 3 AIs to 3 cores if they are trying to WRITE to shared memory.

 

You CAN have 4 different cores generate 1/4th of a procedural generated map and then stich them together ONLY IF the map generation algorithm doesn't have to know about each decision made, to make the next one.  In general to do something like this, you must design for it.

 

You can receive all user input on 1 thread, AI input can be generated from another thread, and these things can be sent to a 3rd thread that does the "work" of processing your game.  However, the AI thread isn't really independent from the "game logic" thread because it must be blocked during the full phase the game logic thread is modifying game state.  Unless you use a "double game data" technique similar to graphics double buffering.  Which is almost unheard of.  But there are still benifits of the 2 threads even though the AI is a blocked slave half the time.  The benifit is ... it can run in parallel with other game logic slaves.  So 50% (or any other amount) of time, only 1 core is doing the heavy lifiting ... then the other part of the time, each core is busy doing a separate part of the game that is driven by the game logic (for instance 1 thread drawing, 1 running AI, 1 sending network info, etc).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this