DirectPlay - sudden disconnect

Started by
7 comments, last by Anon Mike 19 years, 5 months ago
I am seeing a problem where clients suddenly disconnect from my server without reason. Sometimes all of them, sometimes just one or more. The server is still running fine and the clients can reconnect without problem. Any thoughts on what might cause this to happen over DirectPlay? Thanks. Strahan
Advertisement
You may have 2 different problems, one is network instability and the other is not enough server threads.

You can increase the default delay timeouts, retry counts and connect timeouts with SetCaps(). That would take care of the networking problem.

But first you should try to increase the number of threads. By default you get 2N+2 server threads where N is the number of CPUs in your system. If the code that processes messages takes too much time then you incur artificial delays in the message queue and the server assumes clients have disconected. Increase the number of threads with IDirectPlay8ThreadPool::SetThreadCount().

Hope this helps.

-cb
Thanks for your reply.

Couple more questions:
1. MSDN says SetThreadCount() is deprecated. What should I use instead?
2. Do I put this code on the client, server or both?


Here is the code I put on the server (c#):

private void StartServer(){   _ServerConnection = new Server();   Caps caps = _ServerConnection.Caps;   caps.ConnectRetries = 50;   caps.ConnectTimeout = 5000;   _ThreadPool = new DirectPlay.ThreadPool();   _ThreadPool.SetThreadCount(-1, 100); // 100 threads too many?   ...}
To few threads is unlikely to be the problem. Making 100 threads will almost certainly make the problem worse. The likely cause (other than network problems) is to much load on the cpu. DirectPlay is very sensitive to cpu load and if you're running near 100% for any length of time (as short as 1 average round trip time) you'll see this sort of thing happen.

You have to architect your server so that it runs at no higher than ~75% cpu if you want to be reasonably sure to avoid this. Call Sleep often to make sure that DPlay threads get the cpu time they need. Actually I had the most luck using 0 threads and explicitely calling DoWork on the thread pool object in the server loop.

This sudden disconnect problem is on the biggest reasons a MMPORPG I was working on tanked. DPlay was designed for LAN connections and limited numbers of clients. I really don't recommend for apps with large number of clients on the general Internet.
-Mike
> To few threads is unlikely to be the problem.

DirectPlay's default settings are bad; they work well only for a handful of users and it has to be tuned for performance once you reach 10+ users. This document shows how to do such performance tuning:

http://www.microsoft.com/mscorp/corpevents/meltdown2001/ppt/DPScaleNPerf.ppt

As you said, it depends on how much work each thread is doing and I'd add how well you have written your multi-thread code to the list. There is little DirectPlay can help you with if the server app is CPU-bound to begin with.

-cb
> There is little DirectPlay can help you with if the server app is CPU-bound to begin with.

True, but that doesn't really do justice to exactly how sensitive DPlay is to momentary spikes in cpu load. Actually spikes aren't the whole problem as any interrupt in processing cause the same thing (e.g. a sudden bunch of page faults to rarely-accessed data).

If you connect a bunch of clients and watch the traffic with a network sniffer you can see the problem immediately. DPlay's protocol depends on regular (and frequent) exchanges of data to keep the connection alive. If your clients are idle then DPlay is still exchanging keepalives for just this reason.

Interrupt the cpu for a short time and you get a backlog of data that the server can't work it's way through before all the clients send another round of keepalives. At this point you're in a vicious circle that can only be broken by dropping clients and the added work of doing that bookkeeping increases the load even more. The result is the sudden massive disconnect the original poster saw.


-Mike
You can adjust the time interval between keepalives and also the keepalive number threshold at which point you declare a client dead. The keepalive messages are dynamically spaced from an adjustable interval up to twice this interval, and you can set the throttle rate as well. Did you try to change those values and what were your findings?

-cb
Quote:Original post by Anon Mike

You have to architect your server so that it runs at no higher than ~75% cpu if you want to be reasonably sure to avoid this. Call Sleep often to make sure that DPlay threads get the cpu time they need. Actually I had the most luck using 0 threads and explicitely calling DoWork on the thread pool object in the server loop.

This sudden disconnect problem is on the biggest reasons a MMPORPG I was working on tanked. DPlay was designed for LAN connections and limited numbers of clients. I really don't recommend for apps with large number of clients on the general Internet.


On the contrary, the "MAZE" sample in the DirectPlay samples is supposed to work well even with 1000+ users.

As far as the original problem of sudden disconnection goes, here's what I think:
I am assuming that your CPU is free atleast once in a while to do network related stuff.

1. If this disconnection happens on your LAN, then there's something wrong with your app. Or your firewall is disabling the ports after a while.
2. If it happens over the internet, then further investigation is required.

First make sure you are not calling pClient->Close() or pServer->DestroyClient even by mistake.


Regards
Playing with the various knobs changed the maximum number of clients and the mean-time-to-sudden-disconnect. The best we achieved was 10,000 simaltaneous connections, 99% of which were idle, resulted in a MTSD of a couple hours. a more realistic 1000 connections with 90% idle gave similar results (not suprising).

Chucking DPlay and switching to a custom TCP-based protocol took about a week for the first implementation and robustness was easily an order of magnitude better. A later custom reliable-UDP based protocol performed just as well. It also dramatically lowered our memory footprint and the number of threads (for the non-0 threads DPlay case) But, alas, it was to late for our game.

Yes the maze sample supports 1000 clients. It's also a trivial app that spends most of it's time asleep and generally runs for a few minutes while people nod thier head and go "yup it works". You can't compare it to a full-blown MMPORPG server with 24x7 availability requirements.
-Mike

This topic is closed to new replies.

Advertisement