Boost.::ASIO, IOCP and other considerations

Started by
7 comments, last by Matt77 14 years, 10 months ago
I'm working on a MUD project for a while now. I started it from ground up, learning things as they were needed. Right now, things are starting to get in to shape, there is still a lot of work to do, but I'm getting somewhere. The base code uses Windows IOCP, so, it's bound to windows, and I want it to change.I'm planing to use boost.ASIO, since it uses the most efficient thing, (IOCP, epoll ...) provided by the OS. The problem is that it's very difficult to find a good tutorial, book, or something about it. Sure there are the official tutorials on boost's site, but, they really suck.I've understand the basics, but I'm unable to do something similar to the approach used with IOCP(Worker threads). So, if anyone is aware of some good and complete tutorial, book, article, something using boost asio, it would be very helpful, and would spare me weeks of wallowing through pages and pages and endless pages of class references. That was the Boost.::ASIO, IOCP part, now to the other considerations. As said, I have this MUD project. And would really appreciate if someone could give a look at the source code. It's far from be a MUD, but the bases are there. I'm specially worried with the one CRITICAL_SECTION per client issue, it seems to me a lot of critical_sections.So if you have the time and patience to give it a look, I would be glad. And since IOCP is a very common topic in this forum and the project is openSource, it could be useful to others. Here is the link to the server's sourceforge repository: Ethernia MUD Thank you
Advertisement
I don't have an asio tutorial handy, but here is what your typical loop will look like (it's just some random code for an UDP server I have, compare to asio tutorials on how to use TCP instead):
class Server {	typedef boost::asio::ip::udp::socket socket_type;	typedef boost::asio::ip::udp::endpoint address_type;	typedef boost::asio::io_service io_service;public:	Server(io_service & service, const Settings & settings) 		: socket(service, address_type(boost::asio::ip::udp::v4(), settings.port))		, timer(service)		, buffer(settings.MTU)		, current_tick(0)	{		start_receive();		start_service();	}private:	void start_receive() 	{		socket.async_receive_from(			boost::asio::buffer(buffer),			remote_address,			boost::bind(				&Server::handle_receive,				this,				boost::asio::placeholders::error,				boost::asio::placeholders::bytes_transferred			)		);	}	void start_service() 	{		timer.expires_from_now(boost::posix_time::milliseconds(5));		timer.async_wait(			boost::bind(				&Server::handle_timer,				this,				boost::asio::placeholders::error			)		);	}	void handle_receive(const boost::system::error_code & error, std::size_t n_read) 	{		// process received data here		start_receive();	}	void handle_timer(const boost::system::error_code & error)	{                // main simulation loop here                // send data here                // start timer again		start_service();	}	socket_type socket;	address_type remote_address;	struct Connection {		ConnectionCallback * cb;		size_t secret;		size_t last_active;	};	std::map<address_type, Connection> connections;		size_t current_tick;	boost::asio::deadline_timer timer;	std::vector<char> buffer;};...boost::asio::io_service io_service;udp::Settings settings;udp::Server udp_server(io_service, settings);io_service.run();
I don't think this even compiles, it's about the main flow of code.

There are two asynchronous handlers. First one is receive. This one posts a receive request on socket. When completed, parse the incoming data, do the rest. If more than one start_receive per server is called, and server runs in multiple threads, then receive handlers must synchronize their access to shared resources. For TCP, as long as you issue only single receive per socket, there is no need for synchronization between connections. Data must still be synchronized when passing it to service handler.

The second one is called service. That is not a socket operation, but is operated by timer. Every 5 milliseconds it wakes up, every time in single thread only, and this is where you would simulate the world.

Timer in this case is lightweight operation. The above is identical to classic socket operation:
while (true) {  select(.., .., .., 5);  recv(); // receive  simulate();  send();};



Like I said, the above example isn't ideal, but I don't have anything better handy, and once you grasp the basic idea, things are pretty straight-forward.
So, in order to obtain IOCP/workerthread-like server, I must to call io_service.run() in each worker? Is that it?
Regarding critical sections: There really is no reason to have more critical sections than you have threads in the program, because each thread can block on at most one critical section at a time. However, it is often hard (or, given the APIs involved, impossible) to actually push the number that low. It is, however, an insight you should take to heart when you design a locking system.
enum Bool { True, False, FileNotFound };
I'm also using boost::asio in order to build a game server, and have been thinking about these thread issues too.

I can see two ways to achieve a multi-thread server. First one would be multiple calls to io_service.run, each in a different thread, having the strand(also from boost::asio) object to synchronize the callback handlers. The other way is using threads, locks and critical sections.

I think that the first approach is better, because you have a simpler code, and let the boost::asio concern about thread sync. But i'd like some other opinions, so what do you think guys??
Strands are pro-active version of mutexes. Instead of calling a function, and having a function lock, you ask for function invocation in a specific threading context. Regardless of how many such invocation are requested, only one will ever be active at any single time.

Primary benefit of strands is ability to transparently use non-reentrant code, such as legacy C libraries.

For networking, multi-threaded workers can be synchronized implicitly. For TCP, having one outstanding request per socket is usually enough, and at same time guarantees that there is no need for synchronization of socket handlers.

For IO-heavy servers it may be beneficial to have more outstanding requests, where some synchronization may be needed.

Quote:I think that the first approach is better, because you have a simpler code, and let the boost::asio concern about thread sync.


It isn't that simple. Asio's scheduler knows how to handle workers, but doesn't guarantee thread safety.

Strands are usually useful for synchronous systems. For example, each service could have one strand, and all invocations would use the same one. This allows multiple different services to run concurrently, but each individual one runs serially.

In example I gave above, service handler could use one strand, while networking in another. Typically, one would split this functionality into two classes, but run them on single io_service. Then, if needed, one could start multiple network handlers (quite trivial to run concurrently), but stick with single service handler (simulation loop, which isn't trivial to distribute).
Thanx Antheus.

If i'm getting this right, then what you're suggesting is to have multiple network handlers (readers and writers) using the same one strand object, and then have one processing (or simulation loop) service using another strand object. Is that right? If so, then i would have one class to communicate with the network and another for simulating the world?

I'm thinking of network handlers producing messages to and consuming messages from buffers owned by the world simulation class.
There is no real need for networking handlers to be synchronized. With TCP, you can schedule recv and send calls in such a way that it's not needed. With UDP, you can perform atomic local operations concurrently (packet validation, parsing).

Strand basically says - run this function in a specific context/thread, whenever you find time.

An example could be database queries, or spawns. For database, you would fire off queries, but use main loop's strand for callback. This would cause the queries to complete asynchronously, but their results would be only submitted when simulation loop is not running. In addition, if main loop uses a timer as in example, asio takes care of properly scheduling it, so that too many responses do not delay the main loop (it will be one response callback late at most).

For spawns, you could do the following:
void on_receive() {  // parse the packet  // determine that something needs to be spawned  simulation.spawn(....)};Simulation::spawn(...) {  io.post(strand.wrap(boost::bind(&Simulation::real_spawn, this)));}


Even though network receive handlers are running in perhaps different thread, and even though simulation loop is independent, spawn request is enqueued. When main simulation loop is not running, it will invoke the real_spawn.

This means that no explicit synchronization is needed, and it avoids blocking that you would get with explicit locking.

If explicit locking were used, when simulation.spawn() would be called while simulation is updating (inside a lock), the on_receive handler would block, causing that network handler to stall.

This is a good, generic and fairly robust way to approach asynchronous design. While asio does the job fairly well, the above might not produce satisfactory results for delicate real-time simulations. Scheduling used is robust, but not entirely overhead free. As number of timers, strands and outstanding non-IO requests increases, the CPU overhead becomes larger than it would with manual scheduling. It still remains a good choice for rapid development, since it scales reasonably, and due to design, is fairly trivial to refactor, if need be.

Perhaps the biggest complaint I have is the absurd namespace bloat. Once you include strands, request allocation handlers, and non-trivial method calls, it is not uncommon for a single post call to be 5-10 lines long. But that can be wrapped away, it is just annoying to work with.
I think now i undestood better what you were saying. But still i cannot see how UDP handlers could be perfomed atomic and concurrently, without explicit sync ...

This topic is closed to new replies.

Advertisement