Sign in to follow this  
strahan

DirectPlay - sudden disconnect

Recommended Posts

I am seeing a problem where clients suddenly disconnect from my server without reason. Sometimes all of them, sometimes just one or more. The server is still running fine and the clients can reconnect without problem. Any thoughts on what might cause this to happen over DirectPlay? Thanks. Strahan

Share this post


Link to post
Share on other sites
You may have 2 different problems, one is network instability and the other is not enough server threads.

You can increase the default delay timeouts, retry counts and connect timeouts with SetCaps(). That would take care of the networking problem.

But first you should try to increase the number of threads. By default you get 2N+2 server threads where N is the number of CPUs in your system. If the code that processes messages takes too much time then you incur artificial delays in the message queue and the server assumes clients have disconected. Increase the number of threads with IDirectPlay8ThreadPool::SetThreadCount().

Hope this helps.

-cb

Share this post


Link to post
Share on other sites
Thanks for your reply.

Couple more questions:
1. MSDN says SetThreadCount() is deprecated. What should I use instead?
2. Do I put this code on the client, server or both?


Here is the code I put on the server (c#):


private void StartServer()
{
_ServerConnection = new Server();
Caps caps = _ServerConnection.Caps;
caps.ConnectRetries = 50;
caps.ConnectTimeout = 5000;
_ThreadPool = new DirectPlay.ThreadPool();
_ThreadPool.SetThreadCount(-1, 100); // 100 threads too many?

...
}

Share this post


Link to post
Share on other sites
To few threads is unlikely to be the problem. Making 100 threads will almost certainly make the problem worse. The likely cause (other than network problems) is to much load on the cpu. DirectPlay is very sensitive to cpu load and if you're running near 100% for any length of time (as short as 1 average round trip time) you'll see this sort of thing happen.

You have to architect your server so that it runs at no higher than ~75% cpu if you want to be reasonably sure to avoid this. Call Sleep often to make sure that DPlay threads get the cpu time they need. Actually I had the most luck using 0 threads and explicitely calling DoWork on the thread pool object in the server loop.

This sudden disconnect problem is on the biggest reasons a MMPORPG I was working on tanked. DPlay was designed for LAN connections and limited numbers of clients. I really don't recommend for apps with large number of clients on the general Internet.

Share this post


Link to post
Share on other sites
> To few threads is unlikely to be the problem.

DirectPlay's default settings are bad; they work well only for a handful of users and it has to be tuned for performance once you reach 10+ users. This document shows how to do such performance tuning:

http://www.microsoft.com/mscorp/corpevents/meltdown2001/ppt/DPScaleNPerf.ppt

As you said, it depends on how much work each thread is doing and I'd add how well you have written your multi-thread code to the list. There is little DirectPlay can help you with if the server app is CPU-bound to begin with.

-cb

Share this post


Link to post
Share on other sites
> There is little DirectPlay can help you with if the server app is CPU-bound to begin with.

True, but that doesn't really do justice to exactly how sensitive DPlay is to momentary spikes in cpu load. Actually spikes aren't the whole problem as any interrupt in processing cause the same thing (e.g. a sudden bunch of page faults to rarely-accessed data).

If you connect a bunch of clients and watch the traffic with a network sniffer you can see the problem immediately. DPlay's protocol depends on regular (and frequent) exchanges of data to keep the connection alive. If your clients are idle then DPlay is still exchanging keepalives for just this reason.

Interrupt the cpu for a short time and you get a backlog of data that the server can't work it's way through before all the clients send another round of keepalives. At this point you're in a vicious circle that can only be broken by dropping clients and the added work of doing that bookkeeping increases the load even more. The result is the sudden massive disconnect the original poster saw.


Share this post


Link to post
Share on other sites
You can adjust the time interval between keepalives and also the keepalive number threshold at which point you declare a client dead. The keepalive messages are dynamically spaced from an adjustable interval up to twice this interval, and you can set the throttle rate as well. Did you try to change those values and what were your findings?

-cb

Share this post


Link to post
Share on other sites
Quote:
Original post by Anon Mike

You have to architect your server so that it runs at no higher than ~75% cpu if you want to be reasonably sure to avoid this. Call Sleep often to make sure that DPlay threads get the cpu time they need. Actually I had the most luck using 0 threads and explicitely calling DoWork on the thread pool object in the server loop.

This sudden disconnect problem is on the biggest reasons a MMPORPG I was working on tanked. DPlay was designed for LAN connections and limited numbers of clients. I really don't recommend for apps with large number of clients on the general Internet.


On the contrary, the "MAZE" sample in the DirectPlay samples is supposed to work well even with 1000+ users.

As far as the original problem of sudden disconnection goes, here's what I think:
I am assuming that your CPU is free atleast once in a while to do network related stuff.

1. If this disconnection happens on your LAN, then there's something wrong with your app. Or your firewall is disabling the ports after a while.
2. If it happens over the internet, then further investigation is required.

First make sure you are not calling pClient->Close() or pServer->DestroyClient even by mistake.


Regards

Share this post


Link to post
Share on other sites
Playing with the various knobs changed the maximum number of clients and the mean-time-to-sudden-disconnect. The best we achieved was 10,000 simaltaneous connections, 99% of which were idle, resulted in a MTSD of a couple hours. a more realistic 1000 connections with 90% idle gave similar results (not suprising).

Chucking DPlay and switching to a custom TCP-based protocol took about a week for the first implementation and robustness was easily an order of magnitude better. A later custom reliable-UDP based protocol performed just as well. It also dramatically lowered our memory footprint and the number of threads (for the non-0 threads DPlay case) But, alas, it was to late for our game.

Yes the maze sample supports 1000 clients. It's also a trivial app that spends most of it's time asleep and generally runs for a few minutes while people nod thier head and go "yup it works". You can't compare it to a full-blown MMPORPG server with 24x7 availability requirements.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this