Multithreading message queue ... performance concerns?

Started by
19 comments, last by Kylotan 7 years, 4 months ago

I'm currently working on a multiplayer server in Java which is using seperate threads to listen for incoming data via TCP.

Now, i've worked in java and did some minor multithreaded applications before, but i've never done anything of such a scale.

My problem is that even though the server "works" the code itself is quite ugly and thus i try to refractor it without sacrificing a huge amount of performance.

So i create 2 threads on the server for each connected client:

- The first thread is waiting for incoming TCP messages

- The second one is processing data, sending data back to the client (without being blocked by the TCP thread), doing movement validation, etc...

Now here is the thing:

The first thread which handles the incoming TCP data also sends data back or does some other funky stuff (like setting specifig flags for the player or accessing other systems to update player coordinates, etc...)

This makes the Code quite ugly as i have stuff like this lying around (on the TCP thread):


case 201: //incoming message ID
	//reset run
	player.getCheckpointList().getLock().writeLock().lock();
	player.getCheckpointList().clear();
	player.getCheckpointList().getLock().writeLock().unlock();
	
	break;
	
case 202:
	//checkpoint activated
	int time = StreamConverter.buffer_read_u32(input);
	
	player.getCheckpointList().getLock().writeLock().lock();
	player.getCheckpointList().add(time);
	player.getCheckpointList().getLock().writeLock().unlock();
	
	break;
	

case 203:
	//goal activated
	long goalTime = StreamConverter.buffer_read_u32(input);
	//TODO: check if checkpoint count is valid
	PlayerClientData data = player.getPlayerDataCopy();//playerDataCopy to avoid multithreading issues.
	int hash = data.getName().hashCode();//has to be more specific (like using the playerskin in addition to the name, etc...)
	//TODO: Better hashcode for player identification
	server.getGameState().getHighscoreState().addEntry(new LeaderboardEntry(goalTime,data.getName(),(byte) 1,hash));//add to highscoreState (replaces entry if one already exists)
	
	//clear checkpoint list
	player.getCheckpointList().getLock().writeLock().lock();
	player.getCheckpointList().clear();
	player.getCheckpointList().getLock().writeLock().unlock();

	
	break;

The thread which handles the incoming TCP traffic does access other parts of the systems and has the ability to send messages back. This means that it has to use locks (in one form or another) to prevent race conditions.

Now i figured that it would be a good idea to transfer all the response code to the other Thread (which is doing the exact same stuff anyway) and use a messaging Queue to send messages from one thread to the other.

This would:

A. Make the TCP thread have only one task: Listening to TCP messages and notifying the other thread without worrying about responding to the client or updating data on the serverside.

B. Make the other thread the only place where any kind of data is processed or sent to the client. (Which would make it cleaner and more unified.)

Now the issue which i'm worrying about is the performance of messages:

The only way i can think of implementing that is some kind of message objects which are put into the queue by one thead and then read by the other thread.

Now, given that Java is using a GC and every single message would require allocating a new object on the heap, this could be quite problematic in the long run especially as i want to support up to 64 clients in realtime. (64 Threads would be generating a lot of garbage in a very short time which would require more frequent GC pauses. Although i have no idea how big of an performance impact this would have.)

Does anyone have experience with message queues and their performance impact?

Is this a common solution for servers (or any kind of multithreaded application?)

Advertisement
Generally it is well understood that one-thread-per-connection is a bad idea; scaling two is going to be even worse, as you've discovered.

A better approach is to have a small pool of IO threads that take turns pulling data off any ready sockets. They then pass this to a single simulation thread which may in turn spawn jobs on a standard multithreaded job system for parallelization.


Moving to Network forum for additional feedback.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

You can use ExecutorServices for this I guess: http://winterbe.com/posts/2015/04/07/java8-concurrency-tutorial-thread-executor-examples/

The TCP thread would make a pooled executor with a fixed amount of threads (ie, Executors.newFixedThreadPoop(threads)), and when receiving a package it'd push a task to it then keep listening. The ExecutorService handles automatically which thread gets which task for you. This way you only have a couple of threads (say, number of cores - 1, so to leave the VM one thread to work).

Of course, if each task is mutating shared data, you'd have to apply necessary synchronization.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Generally it is well understood that one-thread-per-connection is a bad idea; scaling two is going to be even worse, as you've discovered.

A better approach is to have a small pool of IO threads that take turns pulling data off any ready sockets. They then pass this to a single simulation thread which may in turn spawn jobs on a standard multithreaded job system for parallelization.


Moving to Network forum for additional feedback.

I've read a bit on that topic. As far as i know it is generally considered "bad practice" but from what i was able to gather this will only start to become an issue if the server has to manage threads upwards of one hundred (like 200 or 300.) (Disclaimer: I'm definitely not an expert on that field, so my assumptions can of course be entirely wrong.)

Given that i've never worked with this style of server programming (i think for java there is the Java NIO api) and the current server was already somewhat finished/usable i sticked with the current architecture. (But i'll consider rewriting that if it turns out to be a big issue.)

The main question is if creating message Objects on the TCP socket threads which will then be read by the second thread will be bogging down the performance due to the huge Garbage generation.

The TCP thread would make a pooled executor with a fixed amount of threads (ie, Executors.newFixedThreadPoop(threads)), and when receiving a package it'd push a task to it then keep listening. The ExecutorService handles automatically which thread gets which task for you. This way you only have a couple of threads (say, number of cores - 1, so to leave the VM one thread to work).

That's actually a neat idea. But this would require to use the "single thread for all connections" architecture which i've very little to no experience with. This would probably be the best solution in terms of code structure/performance. But even then this would require pretty much 2 threads per player. 1 thread has to be active in order to do realtime validation of player data (veryfying movement data, and sometimes sending information from time to time which are triggered when certain conditions are met, etc...) while the other threads (which would be generated by the thread executor) would also send responses back to the client or fiddle around with other systems (if they need to.)

What i would like to do is to transfer all those operations into the one thread which also does the validation of playerdata. (in order to reduce the number of places where locks on systems get executed and to have a central place for sending data back to the player.)

Of course i could have one single thread which is waiting for incoming TCP data from all clients and then create (as an example) Message objects which get fed into a message queue so that other threads can read them and then take the coresponding actions.

The big question is: How does a message-queue system fare on a managed language in terms of performance (due to the huge number of message objects which get allocated on the heap)?

If your world contains shared state (everybody wants the same pickups, collides against the same walls, go through the same doors, shoot at each other, etc) then using multiple threads for accessing the same world is a bad idea.

A typical simple server app in Java may use a single thread per TCP socket to send/receive data, perhaps, but it's better to use nio.
And, when receiving data, you want to post it in a queue to your simulation code, which you want to run once per tick; harvesting messages from all the players, running the simulation, and then sending updates as necessary to the players.
enum Bool { True, False, FileNotFound };

If your world contains shared state (everybody wants the same pickups, collides against the same walls, go through the same doors, shoot at each other, etc) then using multiple threads for accessing the same world is a bad idea.

A typical simple server app in Java may use a single thread per TCP socket to send/receive data, perhaps, but it's better to use nio.

I think i will look further into that NIO stuff in the next few days. (Now i'm really curios how this exactly works.)

And, when receiving data, you want to post it in a queue to your simulation code, which you want to run once per tick; harvesting messages from all the players, running the simulation, and then sending updates as necessary to the players.

One question though: Creating messages will have an overhead. (I'll want to store specific data depending on the message. Like movement data will have vectors, chatmessages will have the recieved string, etc...) This means that i'll have to use simple Objects which represent such data. Those will then be passed to a queue by the TCP thread.

Isn't this going to create a lot of garbage (which then the Garbage Collector will have to clean up?) due to constant messageobject allocations? I know that java is fast at creating new Objects due to preallocated memory, but recieving messages (and allocating new message objects) from 64 player simultaneously seems to be a lot. I don't really like the idea of having the garbage constantly cleaned up by the GC and thus decreasing the overall performance/responsiveness of the server application.

This means that i'll have to use simple Objects which represent such data. Those will then be passed to a queue by the TCP thread.


Yes, one of the problems with Java is that it only has reference types (objects) not value types (other than simple primitives.)
This adds more pressure on the garbage collector.
Generally, I'd recommend defining an object that contains all of the network-visible state for an object, and keeping that around as "the view of the object."
You can then link that object to all queues of outgoing data for clients -- "send this view to that client."
The queue-message-objects need to be more dynamically allocated, which you can generally solve by keeping a free list of messages rather than calling new and then losing track for each one.
Finally, the message to the client goes into a byte buffer. Typically, you allocate one byte buffer up front with a bigger size (say, 65k) and then use some slice of it for your actual work.

That being said, if GC performance is the worst problem your game will have, you'll be doing very well!
There are good tools to profile those kinds of problems and there are known work-arounds (such as what I indicate above.)
enum Bool { True, False, FileNotFound };

That's actually a neat idea. But this would require to use the "single thread for all connections" architecture which i've very little to no experience with. This would probably be the best solution in terms of code structure/performance. But even then this would require pretty much 2 threads per player.
Not really. The TCP thread is just a polling thread. All it does is receive a packet, then it immediately sets up a task and hands it to the thread pool. TCP thread does nothing but listen at the socket, no matter the player. The point of the TCP thread is that it is thin an does minimal work.

You'd have one thread, plus a bunch of worker threads, for any number of players. Number of players here doesn't affects the number of threads, since the thread pool maintains them alive (most certainly parked if they're doing nothing) to process further tasks, that might or might not belong to the same player, its irrelevant.

Have also in mind that creating a thread is a costly operation, so you actually want that pool regardless, even if you only use one thread for TCP and one thread for the rest.

Isn't this going to create a lot of garbage (which then the Garbage Collector will have to clean up?) due to constant messageobject allocations?
Yes. For objects that "escape" threads and methods (after JIT inlining), heap alloc is inevitable. So every once in a while a few packets might take more time to get processed (ie, when GC runs). Have in mind that not all GCs are born the same. Short lived objects get collected much faster than long lived objects. HotSpot has a few object generations each with different characteristics regarding GC.

The thing is that you're missing the forest for the trees. Your objective shouldnt be "every packet must have 10ms latency" but rather "[high_percentaje] of packets must have 10ms latency tops". The point is not to never do GC, but to keep GC below your required limits. Some packets might need more time than others, and thats fine, as long as you keep them below your limits.

Also have in mind that new objects don't really affect GC time. Garbage collection actually doesn't collects garbage at all. It collects the stuff that isn't garbage, the objects that you're using.

For short lived objects (like your messages I assume, you'd need to profile it), they will probably live in eden generation, or young gen, they'll never make it to old gen. Eden/young gen collections get triggered when you run out of space in that gen, but what HotSpot does is to copy back the live objects, not the dead ones. No longer reachable memory doesn't gets touched. So the GC will only take longer than usual if you have tons of recently allocated live objects, whereas if you used them and discarded them right away, they don't impact the GC time. Allocation of short lived objects impact the frequency of GC, not the duration. Whereas for long lived objects its the opposite, long lived objects get compacted so if you have many of those, it will take longer. This is why object pools are discouraged, because they "artifically" increment the liveness of a bunch of tiny objects, which later the GC will have to move around and compact each time an old gen GC is made.

So your usual techniques for avoiding GC might trade a bunch of shorter GCs for one big stop the world GC that will increase your latency a lot for a few specific packets. Which is not what you really want, specially since interpolation techniques on the client side can mask short delays very well. You want to maintain the overall throughput constant, even if its not the highest it can be.

As always, first and foremost, try it out. If it ever becomes a problem you can refactor.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Your objective shouldnt be "every packet must have 10ms latency" but rather "[high_percentaje] of packets must have 10ms latency tops"


Well, maybe! It is totally possible to build systems that guarantee latencies. However, doing so IN JAVA is a lot harder, because of the way Java works.
Or, to put it another way: If you are OK with setting some kind of percentile performance, and then optimizing until you get there, knowing that you will never be perfect, then Java is a fine language.
Any language with a GC is similar, although some languages are better at reducing the amount of heap GC (value types in C# help, optimizing away the stack in Haskell helps, reducing the size of the heap to whatever a single message loop uses in Erlang helps, etc.)
There even exists GCs with predictable latency -- incremental generational systems that mark N links for each new allocation are guaranteed to always make progress and never take more than N amount of time per allocation, but these collectors have other draw-backs (no guaranteed upper bound on memory used for example) and aren't generally available in typical desktop/server tool chains.
enum Bool { True, False, FileNotFound };

Well, maybe! It is totally possible to build systems that guarantee latencies.
Of course. Its been done for high-frequency trading, also in Java. There is the G1 collector in HotSpot that can be set up to try to reach latency targets. There is the Azul System's GC for HotSpot that is designed for "zero latency GC" (at lower overall throughput I hear), but if OP needed such thing, he/she wouldn't be asking us anything I believe.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

This topic is closed to new replies.

Advertisement