Multithreaded server - horrible performance

Started by
27 comments, last by LycaonX 14 years ago
Probably not the answer you're looking for, but you could use XF.Network instead of System.Net.

I've scaled it up to 1000 or so concurrent connections without trouble, and the API is delicious.

EDIT: Just saw your last reply. You aren't calling Thread.Sleep inside your callbacks, are you? (The methods that take IAsyncResult and call Socket.EndXXX). Those are threadpool threads, you really shouldn't be doing anything heavy there. Just deserialize your messages and pass them to your main thread. If the sleeps are in your main threads, and they're taking up 85% of your time, then obviously your socket code isn't the problem :P
Anthony Umfer
Advertisement
Quote:Original post by CadetUmfer
EDIT: Just saw your last reply. You aren't calling Thread.Sleep inside your callbacks, are you?


Noooo I know better than to do that :p The async callback is setup as a "get-in get-out as fast as possible" method. The data received is immediately passed off for handling.

Side note: I dabbled with the ThreadPool.SetMaxThreads() method and from the output, it looks like threads from the pool are NOT 'held', rather they're only used to actually call the async method. So those of you reading this with similar problems, no, setting MaxThreads to 250, 250 does NOT make a difference.

Quote:Original post by LycaonX
Noooo I know better than to do that :p The async callback is setup as a "get-in get-out as fast as possible" method. The data received is immediately passed off for handling.


200 connections is nothing, unless they each try to send megabytes of data per second.

So there is a problem with how the data is processed.

Quote:Side note: I dabbled with the ThreadPool.SetMaxThreads() method and from the output,


If completion handling is not the point of contention, then the maximum number of threads used will be equal to number of cores. Looking at thread times on four core machine, maximum of four threads should be seeing 95% of time, with another thread getting the rest.

But as long as processing is fast enough, one or two threads should be more than enough - but that is all handled automatically.

Simply compute the network bandwidth divided by size of packet. This gives you the time available to processing one packet. All cores have this much time before handlers starts hogging the networking part.

Or, if each packet is 500 bytes, and network can handle 50 megabytes/second (symmetric upload/download, echo server), each request needs to be processed in 10 microseconds, or 40 microseconds per thread on 4 core machine.

But this is just the networking part which means the point at which more than 4 threads would be needed. Most realistic systems will not be able to do useful work at such rates. So if handlers complete faster than that, there should never be more than 4 active completion threads. Most systems only get thousands of messages per second - or 1% of this theoretical limit. This means single core single thread will be handling this while being idle 90% of the time. This means there is a lot of time left for processing.


Short version: it is not the networking API that is causing the delays, it's how the data is handled by the application.
I think you may need to post some code for us to be more helpful. I would definitely post your callback function and any procedure your callback calls. Do you have any locks or anything in your callback? Also, post the code that actually makes the call to the BeginRead and post information as to how and when it is called.


What is the CPU load on your server? Are you actually using all the CPU? If not, then you have an algorithmic bug in how you handle your networking, or perhaps a locking bug where you serialize on some blocking code.

Btw: BeginReceive/EndReceive will end up using a thread pool and I/O completion ports in the implementation, so it's a reasonably efficient way to do I/O.
enum Bool { True, False, FileNotFound };
I'm running on my desktop at the moment, quad core 3.0, 8GB of ram. I'm lucky if I hit 10% cpu usage with Firefox with 20 tabs open, WoW running (sometimes three copies simultaneously), miscellaneous folders open, IM programs, mIRC, all the usual junk.

As far as how the data is processed. The server is modular. At run-time, the server loads up all the classes that handle the various opcodes the client sends (for example, chat, changelevel, etc). These are compiled in memory, one at a time and each handler as its own class) and then loaded and stored in a Dictionary(Of Opcode, Handler).

I know you'd like a perfect copy/paste of code but I'm not allowed to distribute it. I suppose I could copy out all the pertinent code and pseudo-code the stuff that's specific to the server. I didn't come up with the restriction, I just signed the paper.

The program has a simple class using an async socket (instead of a tcplistener) to accept connections. I don't see any issues with it; I've had 500 incoming connections all handled gracefully and in less than three seconds for all 500.

The new socket is passed to the ClientManager via an event, which assigns the socket to a Client class (which holds the socket and info on the client like username, etc). The ClientManager then takes over the async operations, initiating handshaking , then passing data received to the main thread via an event.

The main thread then acts on the data. It creates a Packet class from the byte array, which separates out the byte data into an opcode, payload length, and opcode specific data. The main thread then checks the Handler dictionary for the Opcode. If there is a handler for the specific Opcode, the client and data are passed byval to the handler and processed from there. If not, an exception is logged to a file.

I am going to profile the execution time of each handler, it's possible that one or more of them are taking longer than usual. It's a pain though, since VS doesn't appear to have the ability to debug assemblies that are loaded via Assembly.Load. Yes, I do have .GenerateDebugInformation = True in the compile parameters.

Instead of immediately handling the data immediately as soon as it hits the method in the main thread, I've also tried using a Queue in a separate thread. When the data arrives via the event in the main thread, I synclock m_Packets.GetType, add the client/data, then end synclock.

In the Queue processing thread, I check if m_Packets.Count > 0. If it is, then m_Packets is locked, one packet is .Dequeued, end synclock (so other packets can be queued while the current is being processed), then loops until all packets queued are processed. If there are no packets in the queue, I do a Sleep(10) and the loop starts over again.

Whether I use the queue or process each packet as they arrive, I see no visual difference in the latency. I haven't profiled to actually check though.

Also, Ozak, I'm not sure how you'd implement socket operations for 200 connetions without threading. Whether you use a separate manually-created thread and do blocking operations in that thread, or using the built in async methods, you just end up using threads from the process ThreadPool. Would you mind giving a basic phrase of said method so I can help myself and google it?
Altho this is unlikely to be the solution to your problem; you should never lock on a publicly accessible object; especially an entire type. Just make a new private object and lock on that instead.
m_Packets is private.
Topic seems to have died down, but I profiled all the packets used, here's the info. Numbers are seconds. This is over 24.5 hours or so.

PlayerMsg: 0.015 Handles player skill use
PlayerUpdate: 0.031 Handles movement updates
ServerLogin: 5.141 ' Handles player logins
ServerLogout: 0.579 ' Handles ALL logouts
LevelLogin: 0.076 ' Handles level changing
GetLevelPlayers: 1.35 ' Used by npcs to scan for players so they can spawn in the correct areas
ChangeAvatar: 11.388 ' Used by all clients, player and npc, to update their physical look (this seems pretty high)
AgentLogin: 163.216 ' Used by npc logins
TotalTime: 86437682.723

There are other opcodes but none logged enough time to appear on the iist. ChanceAvatar and AgentLogin are fairly high when compared to the others
It's hard to tell how you arrived at those numbers, but it looks as if there is basically no CPU usage in your server. If you have lag problems, it has to be something else.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement