Cooperative multitasking(coroutines) vs true multithreading

Networking and Multiplayer Programming

Started by pachanga November 06, 2009 11:55 AM

12 comments, last by Kylotan 14 years, 5 months ago

144

Author

November 06, 2009 11:55 AM

Guys, I'm developing an MMO game and I'm using coroutines for implentation of the game logic instead of true multithreading. Could you please review and critique this approach? I wonder if anyone around using anything similar. Ok, here it goes... The World thread is where the game simulation logic happens, players interact with each other, NPC AI makes decisions, etc. The World is single threaded. It's single threaded on purpose: putting synchronization logic into World classes would make them IMHO too difficult to develop, debug and maintain. Furthermore, I believe it could even slow down the perfomance due to crititical sections overhead. All World tasks(which are basically boost::fiber coroutines) are executed within a pretty simple scheduler which controls their execution and provides a fixed time budget for all tasks. Currently it's 10ms for all input handling tasks(see below) and 5ms for the rest ones. There is an important rule which must be strictly followed by all World tasks: any expensive operations(e.g IO) MUST be executed in async mode. In order to achieve this all expensive operations are executed in separate threads while the currently executed World task waits for its completion using boost::fibers "yield" call so that other World tasks can be executed in cooperative multitasking fashion. Of course, as I said above, there are other threds besides World. For example, all database related operations are executed in a separate thread which provides all neccessary tools for asynchronous execution of SQL queries(I'm using MySQL as a backend). Using boost::future it's possible to submit an async query from the World thread and wait for its completion without any blocking. The similar schema is used for all other expensive operations(logging, A* searches, etc). Networking is also implemented in a separate thread using async facilities of boost::asio. Incoming packets are transformed into World input handling tasks and are added to into the World scheduler. World communicates with the networking thread by adding outgoing packets into a special queue. As for possible scalability issues, currently I'm experimenting with the idea of running separate World location instances in separate processes. This way it should be possible to spread the load among multiple server boxes in the future. So, guys what do you think about all the said above? P.S. boost::fiber and boost::future are experimental libraries which can be found in boost vault - http://www.boostpro.com/vault/ [Edited by - pachanga on November 7, 2009 8:33:33 AM]

hplus0603

11,916

November 06, 2009 12:33 PM

That sounds like a fairly common approach. If you debug the system, it will probably run well. The biggest draw-back is that it won't scale across multi-cores, as you yourself note.

The main danger with fibers is if some fiber allocates some resources, or schedules a callback, and some other fiber either deletes those resources, or destroys the fiber while the callback is outstanding. In both cases, you will get hard crashes, that can sometimes be hard to debug, because the crash happens when the fiber stack is switched in and the real "culprit" context is lost.

Also remember that fibers are fairly expensive, in that they each need a stack of some size. If you have code that likes allocating large buffers on the stack ("wchar_t path1[1024], path2[1024], path3[1024];") then you will have to budget for the worst case for each fiber. A separate "state machine" approach where an object has to come all the way back to the caller to "yield" will avoid that problem, but the code will be more cumbersome to write.

The Python world has published a lot on this approach; if you google for "twisted" and "stackless python" you'll probably find some worthwhile reading. However, I know of several C++ Linux and Windows systems that do more or less the same thing (although what, specifically, goes into separate fibers depends on the job).

enum Bool { True, False, FileNotFound };

Antheus

2,410

November 06, 2009 12:36 PM

Why not just use Stackless Python, Erlang, Scala, Clojure, Lua, .... C++ is a pain for this type of thing.

Fibers are ok if you want to minimize impact of thread context switches. But as soon as you go multi-core, or scale across multiple machines, it doesn't really offer much anymore, compared to the issues it introduces (exception propagation, stack size).

Quote:It's single threaded on purpose: putting synchronization logic into World classes would make them too difficult to develop, debug and maintain

Fibers alone do not solve data consistency problem. Unless you ensure that each fiber only manipulates the data it owns, then fibers are just as bad as other forms of concurrency.

Future balance("player/balance");Future debt("player/debt");player.update("player/balance", balance->get() - debt->get());

Nothing guarantees that this will be safe, even if no multi-threading goes on. You either need some form of transactional semantics, or locks. STM comes closest to this.

pachanga

144

Author

November 06, 2009 12:56 PM

Quote:Original post by Antheus
Why not just use Stackless Python, Erlang, Scala, Clojure, Lua, .... C++ is a pain for this type of thing.

It's not that painful provided one has good libraries available :) And it's not an option at the moment. Lots of code already written in C++. BTW, AI is actually scripted in Lua and it's the core World logic which is implented in C++.

Quote:
Fibers are ok if you want to minimize impact of thread context switches. But as soon as you go multi-core, or scale across multiple machines, it doesn't really offer much anymore, compared to the issues it introduces (exception propagation, stack size).

So another option(in case of C++) would be to add synchronization logic into every World class? I just wonder if anyone around did this and if he was happy with this decision at the end.

pachanga

144

Author

November 06, 2009 01:10 PM

Quote:Original post by hplus0603
Also remember that fibers are fairly expensive, in that they each need a stack of some size. If you have code that likes allocating large buffers on the stack ("wchar_t path1[1024], path2[1024], path3[1024];") then you will have to budget for the worst case for each fiber. A separate "state machine" approach where an object has to come all the way back to the caller to "yield" will avoid that problem, but the code will be more cumbersome to write.

You are absolutely right about possible stack issues. Currently I'm using 16kb stack for each coroutine and it seems to be pretty enough. I'm using stl containers instead of built-in arrays and any large data is stored on the heap. It's also possible to customize the task in constructor and require more memory.

Quote:
The Python world has published a lot on this approach; if you google for "twisted" and "stackless python" you'll probably find some worthwhile reading.

Thanks, I'll have a closer look

Quote:
However, I know of several C++ Linux and Windows systems that do more or less the same thing (although what, specifically, goes into separate fibers depends on the job).

BTW, in my implementation of tasks it's also possible to specify if the task should be a coroutine or a simple functor. For example, a task which is responsible for responding to client pings is a functor, while a task responsible for Player login is a coroutine since it interacts with the database, login server, etc.

pachanga

144

Author

November 07, 2009 06:57 AM

Quote:Original post by hplus0603
That sounds like a fairly common approach.

I know that some folks around(including you :) ) develop and maintain pretty large MMO games, I'm just curious if this approach is actually used in production. Or maybe in the long term it's better to use full-fledged multithreading and thus put explicit synchronization logic into each World class?

[Edited by - pachanga on November 7, 2009 8:57:57 AM]

hplus0603

11,916

November 07, 2009 10:36 AM

We have used co-routines for an application server for MMOs/virtual worlds for a very long time, and it's worked pretty well. We did not use it for simulation, because simulation is generally better expressed as functors/stepping anyway. However, we have started feeling the pain of not being able to effectively make use of multi-core. The best we can do is put multiple server processes on the same multi-core machine, which is less efficient because the working set of the physical machine becomes bigger.

enum Bool { True, False, FileNotFound };

pachanga

144

Author

November 07, 2009 11:12 AM

Quote:Original post by hplus0603
We have used co-routines for an application server for MMOs/virtual worlds for a very long time, and it's worked pretty well. We did not use it for simulation, because simulation is generally better expressed as functors/stepping anyway. However, we have started feeling the pain of not being able to effectively make use of multi-core. The best we can do is put multiple server processes on the same multi-core machine, which is less efficient because the working set of the physical machine becomes bigger.

Thank you very much for sharing. Do you have any ideas what the alternative could be? I would be very interesting to hear. I personally have a gut feeling that explicit locking in domain logic is wrong and some different model should be used instead. For example, something similar to erlang model where objects don't share any data and they interact with each other by messages only.

hplus0603

11,916

November 08, 2009 04:25 PM

Something like this: Generally, an update will only affect a single object, or perhaps a pair of objects. Doing per-object locking might work well. Put together a queue of work, and a worker thread pool; do object locking; have all operations declare objects they need up front and always lock objects in ascending numerical object ID order to avoid deadlock.

enum Bool { True, False, FileNotFound };

pachanga

144

Author

November 09, 2009 02:48 AM

Quote:Original post by hplus0603
Something like this: Generally, an update will only affect a single object, or perhaps a pair of objects. Doing per-object locking might work well. Put together a queue of work, and a worker thread pool; do object locking; have all operations declare objects they need up front and always lock objects in ascending numerical object ID order to avoid deadlock.

Thanks for the idea. BTW, if anyone interested I started a new more general topic here

Cooperative multitasking(coroutines) vs true multithreading

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Cooperative multitasking(coroutines) vs true multithreading

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines