Jump to content

  • Log In with Google      Sign In   
  • Create Account

[C#/C++]Multithreading


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 Memories are Better   Prime Members   -  Reputation: 769

Like
0Likes
Like

Posted 30 January 2013 - 03:50 PM

For those of you who havent heard, EVE Online encountered a massive 2800+ player fight the other day, which resulted in a less than perfect battle, I wont bother going in to details about the fight or game, however it generated a lot of complaints, the main one being "the game only uses one core, why not all?".

 

The game is built using stackless python and C++

 

Now a spokes person for CCP responded with "...there's no simple way to make something multithreaded..." among other things.

 

Ok so my question is (baring in mind I do ALL my multithreading work in C#), how accurate is this statement? I am not trying to cause a flame war or anything I just find this comment 'unusual' and feel it is invalid but due to limited experience in C++ it would be wrong of me to even assume this, help me understand what he means.

 

The link is here: https://forums.eveonline.com/default.aspx?g=posts&m=2541374#post2541374

 

Thanks in advance

 

PS: The reason I havent asked CCP myself is simply because they are terrible at replying and put very little effort in responding



Sponsor:

#2 Hodgman   Moderators   -  Reputation: 30384

Like
10Likes
Like

Posted 30 January 2013 - 04:46 PM

It's extremely hard to take an existing, large, code-base and try and shoe-horn in parallelism. You need to have a multi-core processing strategy in mind from the very start of the project in order to be effective.

#3 phantom   Moderators   -  Reputation: 7268

Like
5Likes
Like

Posted 30 January 2013 - 04:48 PM

He is correct; multi-threading is not simple.. or to put it better; "multi-threading so things don't run the chance of crashing and other problems while still maintaining performance is not simple".

This is the normal case of gamers, having heard about something, demanding it without really thinking about what's involved in a move like this when you are trying to build on an ever expanding 10 year old game which was originally designed and built back when multi-core systems were not the norm.

Could they do it?
Yes, given time all things are possible... but it'll take a lot of time and a lot of pain (and probably a few dodgy patches long the way) to do so.
Based on the comments in thread re:lag in game things have certainly improved server side over the years (I seem to recall the early lag they talked about when I played around launch; jump into a system with a fair few people around the gate and things started to stall a bit...)

Edited by phantom, 30 January 2013 - 04:49 PM.


#4 ChaosEngine   Crossbones+   -  Reputation: 2356

Like
3Likes
Like

Posted 30 January 2013 - 04:49 PM

the main one being "the game only uses one core, why not all?"

 

I was under the impression that the main issue was server load, not client?

 

"...there's no simple way to make something multithreaded..."  ...  how accurate is this statement?

 

I would say it's pretty accurate. You can't really just flick a switch and turn on multi-threading. While some optimising compilers can parallelise some loops if they can determine there are no side effects, in general you have to actually write multi-threaded code. 

 

That's not too bad if you're starting from scratch. While multi-threaded code still has it's gotchas, more widespread use of the last few years has brought about some patterns and principles that ease the burden.

 

But refactoring an existing code base to be multi-threaded? That's very rarely easy. 


if you think programming is like sex, you probably haven't done much of either.-------------- - capn_midnight

#5 SimonForsman   Crossbones+   -  Reputation: 6109

Like
2Likes
Like

Posted 30 January 2013 - 04:57 PM

Writing multithreaded code is easy, writing multithreaded game code that actually performs better than its singlethreaded counterpart is a bit harder and modifying a huge 10 year old serial codebase to execute well in parallell ... well ... thats borderline insanity.

Efficient parallell code is extremely different from efficient serial code, since EvE is a fairly old game there might be some low hanging fruit to pick but to get the big performance increases it might be cheaper to just start over from scratch with a new client engine.
I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#6 brx   Members   -  Reputation: 720

Like
2Likes
Like

Posted 30 January 2013 - 05:02 PM

I don't know what kind of multithreading you did in C#, but from my experience it's not easier than in C++. So if you say, that multithreading in C# for all your needs is easy than either a) you're a freaking genius or b) your problems were perfectly fitted for multithreading.

Our brain is just designed in a way that it thinks sequential. Once multithreading comes into play, it's hard to even imagine what might happen and to think of all possible scenarios. Andrei Alexandrescu from the C++ consorptium pretty much naiiled it: "Multithreading is just one damn thing after, before, or simultaneous with another". And that's all we know.

When the code based is not designed for parallelism from the start it is a great step to adapt.

The funny thing is, that most of those people complaining about only one core being used would shut up, if you'd just create <number of cores>-1 threads in your program doing nothing but an infinite loop so they see 100% of CPU usage.

What I am trying to say is, that saying "...there's no simple way to make something multithreaded..." is very accurate. Usually, the first 2 or 3 attempts to parallelize a previously sequential algorithm/architecure will lead in full CPU usage but in slower execution.

Edited by brx, 30 January 2013 - 05:03 PM.


#7 Memories are Better   Prime Members   -  Reputation: 769

Like
0Likes
Like

Posted 30 January 2013 - 05:30 PM

I don't know what kind of multithreading you did in C#, but from my experience it's not easier than in C++. So if you say, that multithreading in C# for all your needs is easy than either a) you're a freaking genius or b) your problems were perfectly fitted for multithreading.

 

Just to clear things up a bit, what I meant by my comment was I am only familiar with multithreading in C#, and since his comment was referring to C++ I was wondering what he meant. Oh and I should mention I wasnt a poster in that thread, and only became aware of it when it was moved. I dont actually 'play' but I do make use of their API so it made very little difference to me, this was simply curiosity more than anything :)

 

Anyway thanks everyone for answering, I sometimes word things wrongly but this truly was a "if in doubt ask" moment



#8 brx   Members   -  Reputation: 720

Like
0Likes
Like

Posted 30 January 2013 - 05:37 PM

Sorry if my comment sounded harsh or belittling or anything, it was not meant that way.

#9 swiftcoder   Senior Moderators   -  Reputation: 9993

Like
2Likes
Like

Posted 30 January 2013 - 06:07 PM

Also worth noting that a bunch of Eve's server-side code is written in Stackless Python, and Python in general is a nightmare to multi-thread.

 

(there is a little doohickey called the Global Interpreter Lock, which throws a great big wrench in the works)


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#10 phantom   Moderators   -  Reputation: 7268

Like
2Likes
Like

Posted 30 January 2013 - 06:39 PM

Yeah, that thing is a pain... our old build system was based on Python which basically meant all build 'setup' was single threaded (which involved working out a dependency graph; quick on small asset counts but as the assets increased so a comedy wait time was introduced before it started building) and while the external tools were run outside python because of how it was designed you had to spin up 100s of threads in order to launch and wait for them to finish... (GIL is released when waiting on an external process.)

 

Fortunately this became a big enough problem that a C#/.Net re-write was allowed \o/



#11 ChaosEngine   Crossbones+   -  Reputation: 2356

Like
0Likes
Like

Posted 31 January 2013 - 08:49 PM

Also worth noting that a bunch of Eve's server-side code is written in Stackless Python, and Python in general is a nightmare to multi-thread.

 

(there is a little doohickey called the Global Interpreter Lock, which throws a great big wrench in the works)

Good to know. 

Although I've never used stackless, I was under the (obviously mistaken) impression that easier multi-threading was one of the benefits.

 

Guess I was wrong!


if you think programming is like sex, you probably haven't done much of either.-------------- - capn_midnight

#12 swiftcoder   Senior Moderators   -  Reputation: 9993

Like
3Likes
Like

Posted 01 February 2013 - 09:10 AM

Although I've never used stackless, I was under the (obviously mistaken) impression that easier multi-threading was one of the benefits.

Cooperative multi-tasking, yes. Threading no.

 

Cooperative multitasking-based languages like Stackless Python, Erlang, Scala, and Google's Go, support a very different model of concurrency to the C/C++/C#/Java threading model - it's worth reading up on if you are interested.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#13 MaxDaten   Members   -  Reputation: 108

Like
-1Likes
Like

Posted 01 February 2013 - 05:10 PM

Multithreading is often misunderstood, even under devs. Multithreading is primary used for parallelism and not to speed things up. For example, in games multithreading is ideal to keep your game responsive while the game is loading some resources (for the next area), the user is doing some inputs, or the AI is calculating (re)actions.

 

Yes you can achieve speed ups with mt, and mt is often used for speed ups, for example the rendering in suites like 3ds or Maya. But your problem must be suited to be run in a parallel way. And in most cases the speed up is far away from a linear speed up. With a perfect linear speed up you will gain potentially 300% performance with a quad-core, this seems huge. But a linear speed up is unrealistic. You have to organize (Mutex, MVar, synchronize, STM) the different processes or threads on their meeting-points, and that results into a slow down. It's utopian that a whole game problem will gain a 300% speed up, even +100% is far away from reality. In most cases you will solve specific sub-problems with mt or, and that is the most common way, you decoupling sub-systems from each other to be run parallel on their own processing unit.

 

MT is often a trade-off. MT will make your project much more complex. More complexity will make your project more error-prone and will slow down the whole project progress. Your code-base is more fragile and "uglified". Whats the benefit? More responsiveness, that's fine!. 10%-30% "speed up", maybe not worth it.

 

I highly guess the EVE Online client makes use of parallelism, but not in that way the gamer expect. The gamer takes a look at the task manager and complains about the single cpu usage. But in which situation? Maybe this situation is not really parallelizable. For example: Pumping Data to the graphic card is not well parallelizable, sometime not even possible. Let me speculate: EVE Online parallelize the client view, the network and resource loading. In the huge fleet fight, when everything is loaded on the client-side, the bottleneck will be the rendering. And when the cpu part of the rendering isn't well parallelizable, the cpu usage is reduced to the only rendering core.



#14 Hodgman   Moderators   -  Reputation: 30384

Like
5Likes
Like

Posted 01 February 2013 - 06:58 PM

Multithreading is often misunderstood, even under devs. Multithreading is primary used for parallelism and not to speed things up. For example, in games multithreading is ideal to keep your game responsive while the game is loading some resources (for the next area), the user is doing some inputs, or the AI is calculating (re)actions.
 
Yes you can achieve speed ups with mt, and mt is often used for speed ups, for example the rendering in suites like 3ds or Maya. But your problem must be suited to be run in a parallel way. And in most cases the speed up is far away from a linear speed up. With a perfect linear speed up you will gain potentially 300% performance with a quad-core, this seems huge. But a linear speed up is unrealistic. You have to organize (Mutex, MVar, synchronize, STM) the different processes or threads on their meeting-points, and that results into a slow down. It's utopian that a whole game problem will gain a 300% speed up, even +100% is far away from reality. In most cases you will solve specific sub-problems with mt or, and that is the most common way, you decoupling sub-systems from each other to be run parallel on their own processing unit.

I couldn't disagree with this more. This may be true for typical GUI-tools, but not games. Games are (soft-) realtime applications meaning you've got to hit a fixed time budget per frame, consistently.

 

When you're making a GUI-tool, you need the GUI part to remain at an "interactive" level of responsiveness (not real-time), while you do some heavy processing over a long period of time in the background. Threads are a very convenient way to achieve this -- if you put the GUI in one, and the heavy processing in another, then the OS will ensure that each of them obtains some amount of CPU time every so often (by default on Windows: one 15ms time slice at least once every 5 seconds).

 

Using this same approach in a real-time application is harmful. For example, say that we're on a single-core CPU, and when we load a file into RAM we've then got to run a LZMA decompression step on the loaded data, which takes a total of 1 second. You don't want this to affect the progress of the game's 'main thread' and impact the frame-rate.

 

Approach 1) We put the decompression code into a separate background thread, which sleeps unless it has work to do. When it does have work to do, we're relying on the OS's thread scheduler to choose which thread is running on the single CPU core. By default on windows, the scheduler granularity is 15ms, so the decompression thread will require 67 time-slices to complete it's 1 second task. If our main thread is attempting to run at fixed real-time frame-rate of 60Hz (a limit pf 16.6ms per frame), then during the time that the decompression thread is awake, this is now impossible (unless your 'main thread' only has 1.6ms of work to do per frame). From time to time (unpredictable), the main thread will be put to sleep for an entire 15ms time-slice (or maybe multiple time-slices).

That kind of unpredictability is simply not acceptable to a real-time application.

 

Approach 2) We manually time-slice the decompression code, so that after it's run for ~1ms (or some other chosen threshold), it stores it's state and returns/yields -- a.k.a. cooperative multi-tasking. We run the decompression code on the "main thread" every frame, knowing that the biggest interruption that this task can have is a very predictable 1ms per frame.

 

As swiftcoder mentioned above, many "scripting" languages only provide these kinds of "cooperative multi-tasking threads" (often called Fibers in C++), instead of OS-level threads, and their entire purpose is to allow for concurrency of tasks.

On the other hand, OS-level threads should only be used in order to take advantage of hardware-level threads, which is only useful for gaining extra computational power. Using OS-threads for anything other than gaining access to extra hardware, in a real-time application, is an abuse of them. The exception to this is when interacting with legacy APIs that have long-blocking functions, which force you to put them into a thread.

n.b. file loading and user input aren't in this category -- your OS provides (non-blocking) asynchronous methods for these.

 

Post-load resource processing, and AI processing can both be time-sliced, but may also be multi-threaded if they're processor intensive.

 

MT is often a trade-off. MT will make your project much more complex. More complexity will make your project more error-prone and will slow down the whole project progress. Your code-base is more fragile and "uglified". Whats the benefit? More responsiveness, that's fine!. 10%-30% "speed up", maybe not worth it.

That entirely depends on the MT strategy that you choose. Many job-based strategies end up producing code that's simpler than typical C++ OOP code...


Edited by Hodgman, 01 February 2013 - 09:26 PM.


#15 MaxDaten   Members   -  Reputation: 108

Like
0Likes
Like

Posted 01 February 2013 - 07:53 PM

Approach 1) We put the decompression code into a separate background thread, which sleeps unless it has work to do. When it does have work to do, we're relying on the OS's thread scheduler to choose which thread is running on the single CPU core. By default on windows, the scheduler granularity is 15ms, so the decompression thread will require 67 time-slices to complete it's 1 second task. If our main thread is attempting to run at fixed real-time frame-rate of 60Hz, then during the time that the decompression thread is awake, this is now impossible. From time to time (unpredictable), the main thread will be put to sleep for an entire 15ms time-slice (or maybe multiple time-slices).
That kind of unpredictability is simply not acceptable to a real-time application.
 
Approach 2) We manually time-slice the decompression code, so that after it's run for ~1ms (or some other chosen threshold), it stores it's state and returns/yields -- a.k.a. cooperative multi-tasking. We run the decompression code on the "main thread" every frame, knowing that the biggest interruption that this task can have is a very predictable 1ms per frame.

 

I guess I expressed me wrong.

 

I've tried to outline this dilemma and misunderstanding. My statement was meant to be: you better don't use any mt approach to speed up your application, regardless the core count. You use mt to run things at the same time (for games in the same frame). That's independently which high/low level approach you choose. I completely agree with you, approach 1 is the worst case for a single core and approach 2 is more predictable, yes. But these approaches differ "only" in detail of the level (which is not unimportant and will have a deep impact, indeed). You showed that it's sometimes better for the application to manage its (time) resources on its own. But this added complexity to the project and shouldn't be underestimated (for example: you will loose deterministic).

 

And again, even (or especially) for games, you choose an mt approach not to make the game performance better. If a game dev thinks "uhm, my performance is to bad, let's switch mt on, I hope it will get better", it's the wrong motivation for mt. The best motivation to use any low or high level mt approach is, to let happen things parallel. For example: seamless environment streaming. In fact, if you choose approach 2 (aka high level mt), you will loose performance, if you measure your performance in fps-count, which is not a good performance meter and an other topic.



#16 Hodgman   Moderators   -  Reputation: 30384

Like
3Likes
Like

Posted 01 February 2013 - 09:45 PM

My statement was meant to be: you better don't use any mt approach to speed up your application, regardless the core count.

And my response was the opposite -- the only reason to use multiple threads is to gain access to extra cores, in order to speed up the application.

Concurrency (as in, interleaving two different tasks) is irrelevant -- use coroutines or fibres or manual time-slicing for that kind of concurrency. Use threads to run code on more physical cores. Ideally, your thread count matches your CPU core count, no matter how many 'concurrent' systems you have.

 

Ideally, a game running on a single-core CPU would only have 1 thread, and a game running on a quad core would have exactly 4 threads. The game should be able to split its workload amongst the available pool of threads automatically, and when running on the quad-core, it should be almost 4x faster than when running on a single-core. That's the ideal result, and it's not impossble.

But this added complexity to the project and shouldn't be underestimated (for example: you will loose deterministic).

There's no reason that multi-threaded programs have to give up determinism! Multi-threading strategies that introduce indeterminate behaviour are IMHO, bad strategies, in general (they may have niche applications).

 

One of the first models of computer that you're taught as a student is  input->process->output. You've got some blob of input data, you feed it into some kind of process, and you get some blob of output data. You can then chain sequences of these blocks together in order to create an entire program. At the heart of everything that we do, this model is still relevant.

If you take all the chained IPO blocks that make up one frame of processing in your game, you've got a DAG of processes that need to be run, with dependencies between them (if the input to process #2 is the output of process #1, then process #1 must be complete before running process #2). You can perform a topological sort on this graph to get a linear order of processes, and every process that ends up being sorted to the same 'level' can be run in parallel (across multiple cores) without further synchronisation. This is how many functional programs take any old program and "automatically multi-thread" them, while maintaining perfectly deterministic behaviour.

 

And again, even (or especially) for games, you choose an mt approach not to make the game performance better. If a game dev thinks "uhm, my performance is to bad, let's switch mt on, I hope it will get better", it's the wrong motivation for mt.

The only reason to launch extra OS threads is because you want to make use of extra CPU cores (or you're forced to by legacy APIs), and the only reason to make use of extra CPU cores is because you need/want more processing power. As above, if you just want simple concurrency -- like background loading, streaming of environments -- you do not need extra threads.

Multi-threading it's not something you can 'switch on' later in the project, it has to be designed into the project from the beginning (when using imperative/procedural/OOP languages, anyway). Typical C++ OOP code, when decomposed into an IPO graph, looks like sphagetti code -- every process has too many side effects, and there's too much mutable state, so every process has multiple outputs all over the place. The DAG that's produced is a complex spider-web, that ends up as a serial sequence of processes with few opportunities to take advantage of multiple cores. Trying to parallelize that kind of code is a nightmare. If you really want that 300% speed boost that you mentioned (which is attainable in games, despite what many say), you need to be writing code that's well designed for a smart multi-threading strategy from the very start of your project.


Edited by Hodgman, 01 February 2013 - 09:54 PM.


#17 phantom   Moderators   -  Reputation: 7268

Like
0Likes
Like

Posted 02 February 2013 - 05:01 AM

If you really want that 300% speed boost that you mentioned (which is attainable in games, despite what many say), you need to be writing code that's well designed for a smart multi-threading strategy from the very start of your project.

 

Amusingly this code tends to end up looking more functional than anything else; a few years ago I read a book on Haskell and while the syntax hasn't stuck (because I don't use it) the way of writing code did and it made me better at writing threaded code.

 

The multi-threaded parts of our engine at work are very much functional in that a bunch of state goes in, is used, and a single output is produced in a buffer; this means we can scale up as far as we want or indeed scale down to a single thread for debugging.

 

(As we have a 'chain' of jobs we do give up some determinism with this system, mostly by allowing different processing segments of the graph run at different speeds; although these tend to be short chains of work with sync points introduced to ensure logical blocks of work are completed before moving on.)



#18 Xai   Crossbones+   -  Reputation: 1452

Like
0Likes
Like

Posted 03 February 2013 - 01:49 AM

But to the OP's original post:

 

1.  Multithreading is HARD to use correctly and get any benifit out of, except where a program has multiple, largely independent problems to solve.  Some examples where multithreading can be used easily on multicore machines to get extra performance:  

 

You CAN run your networking or resources loading or almost anything IO bound on a different core than your main logic, so your main logic keeps "running" until the IO is finished, and then your background thread notifies your main thread of the completed IO.  You CAN'T get much benefit trying to split networking itself up to 4 different cores ... they are all bound by the same reasource.

 

You CAN run 3 different AI algorithms on 3 different cores, IF they read from a read-only set of data, that is small enough that sharing it out to the separate cores is significantly faster than the algorithm itself.  You CAN'T get much benefit trying to split 3 AIs to 3 cores if they are trying to WRITE to shared memory.

 

You CAN have 4 different cores generate 1/4th of a procedural generated map and then stich them together ONLY IF the map generation algorithm doesn't have to know about each decision made, to make the next one.  In general to do something like this, you must design for it.

 

You can receive all user input on 1 thread, AI input can be generated from another thread, and these things can be sent to a 3rd thread that does the "work" of processing your game.  However, the AI thread isn't really independent from the "game logic" thread because it must be blocked during the full phase the game logic thread is modifying game state.  Unless you use a "double game data" technique similar to graphics double buffering.  Which is almost unheard of.  But there are still benifits of the 2 threads even though the AI is a blocked slave half the time.  The benifit is ... it can run in parallel with other game logic slaves.  So 50% (or any other amount) of time, only 1 core is doing the heavy lifiting ... then the other part of the time, each core is busy doing a separate part of the game that is driven by the game logic (for instance 1 thread drawing, 1 running AI, 1 sending network info, etc).






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS