Too many threads in this server application?

Started by
14 comments, last by Kylotan 15 years, 7 months ago
Quote:Original post by Antheus
Which is somewhat interesting, given that JVM doesn't have network layer, and all calls are passed down to OS.

How well does it work with 5000 concurrent connections?
Any chance of a link to the articles?


http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html
http://www.classhat.com/tymaPaulMultithread.pdf

I use a thread per connection in my RPC servers and it scales amazingly well.
Advertisement
Quote:Original post by abdulla
Quote:Original post by Antheus
Which is somewhat interesting, given that JVM doesn't have network layer, and all calls are passed down to OS.

How well does it work with 5000 concurrent connections?
Any chance of a link to the articles?


http://paultyma.blogspot.com/2008/03/writing-java-multithreaded-servers.html
http://www.classhat.com/tymaPaulMultithread.pdf

I use a thread per connection in my RPC servers and it scales amazingly well.


A few things should be noted.
- For 100Mb network, the required processing takes under 5% on semi-modern single-core CPUs. So whether CPU can handle this one way or another is not a question. I've done scalability tests, and varying messages from a few bytes to a sizes exceeding MTU has shown that in either case, CPU is not a problem at any point, even under 100% utilization of network
- Responsiveness is not a factor in the provided example
- Work required is embarrassingly parallel - in SMTP and WEB server example, there is no need for synchronization of clients
- "Most threads are idle" - in real-time networking as it applies to game, notifications are sent to all peers at 5-50Hz in either direction. As such, all threads are active at all times. This is completely different scenario that that described in the paper. In addition, all clients need to be synchronized to single global state. The amount of contention is incredible - each message that arrives needs to block large portion of state - that becoming the bottle-neck.
- NIO is a knowingly incomplete, dated and under-maintained project. It does not utilize everything that's available. Java's networking, like many aspects, has been neglected for too long
- It requires specific Linux kernel to get satisfactory performance.
- Author's definition of "asynchronous" appears to be a thread pool (which is common solution to everything in Java). There are scalable solutions which do not require threads at all, or use a fixed, non-queue number of them.

In the same way that the articles mentions generalizations, generalizing over a single blog that applies to Java only is somewhat bold.

Quote:I use a thread per connection in my RPC servers and it scales amazingly well.


Are the RPC calls blocking, or use asynchronous method invocation? If former, then you're limited by network latency either way, and the application is under-utilized, which is why many workload sensitive applications prefer message passing vs. pure RPC.
I should point out, the most I've done in java was write a prank program for a friend (onload = window.open()): I program in C++, and I've forced myself to stick to multi platform API's, so I do not target a specific platform.

I've moved from the many thread model to a 3 thread model (Simulation, Host input, and Communication), however and idea occurred to me: should the connections which actually have activity get a short lived but dedicated thread? Once the activity is handled, the thread would terminate. This means that if one user requires a particularly high amount of time to process their request, the users behind them wont suffer for it, and at the same time, unless many users are making large requests all at once, there aren't a large number of threads running simultaneously.
Quote:Original post by Antheus
In the same way that the articles mentions generalizations, generalizing over a single blog that applies to Java only is somewhat bold.


Sure, it was the first link that turned up in my history. I've been reading a lot of papers lately so forgive me for generalising, but you can dig deeper and find more articles on the subject. I do admit my background is Linux/Mac OS X, so I can't comment on the behaviour of Windows.

Quote:
Quote:I use a thread per connection in my RPC servers and it scales amazingly well.


Are the RPC calls blocking, or use asynchronous method invocation? If former, then you're limited by network latency either way, and the application is under-utilized, which is why many workload sensitive applications prefer message passing vs. pure RPC.


Actually I wrote 2 different RPC libraries, one that does block but uses tricks to reduce latency, the other that attempts to be completely asynchronous. Message passing is great, but it doesn't give the type-safety or convenience of RPC.
Quote:Original post by abdulla

Actually I wrote 2 different RPC libraries, one that does block but uses tricks to reduce latency, the other that attempts to be completely asynchronous. Message passing is great, but it doesn't give the type-safety or convenience of RPC.


Type-safety? I don't believe that is even remotely an issue with message passing unless you're using C. If anything, with support from annotations (or whatever mechanism managed languages provide) it can be not only type-safe, but also implemented safely with regard to system security.

RPC works and is used, and even provided under any managed platform by default.

My experience with pure RPC based applications however has shown that they don't scale elegantly. As long as the system is under-utilized, assuming network transparency will work. Even more if co-location is used with half-sync-half-async model.

When dealing with hundreds of remote real-time entities however, blocking mandated by RPC model becomes a bottle-neck (even with futures). The problem of waiting for responses over network becomes so big, that actions begin to timeout, simply due to accumulation of delays caused by round-trip latency.

The biggest problem however are (un)expected increases in load, such as those caused by network hick-ups or application startup/shutdown, which cause cascading failures. (Several thousand remote entities, all shared over network using synchronous RPC using reactor dispatching)

But again, it depends on the type of application. For web protocols, thread-per-client will often be adequate. In almost all cases, the bottle-neck will lie in infrastructure behind the system (for web server it will be file-system, for web applications the database). The cost of request handling in this case is negligible.

Part of problem here (Java-specific) is lack of alternatives for those libraries. Database access is painfully thread-per-request oriented, file system access doesn't support non-blocking or even async access (except as emulated by threads).

Using non-blocking database access for example allows a single-thread to handle up to actual database capacity without pegging the CPU (or the bottle-neck will be DB queries), even in case of thousands of outstanding queries (again, as allowed by database itself).


What I'm trying to say is that RPC has been studied and implementation strategies have been documented extensively, perhaps by far most by the ACE/TAO. Unfortunately, Java is simply malnourished in some aspects (sound and multimedia, much of networking, certain memory management aspects), which may indeed result in certain solutions unexpectedly seeming better. But this remains a Java-specific topic.

What I'm trying to say is simply that I've found when it comes to scalability it needs to be studied on case-per-case basis. There are several competing techniques, some generally better than others, but often system isn't pushed hard enough for these distinctions to matter.
Quote:Original post by Zouflain
I've moved from the many thread model to a 3 thread model (Simulation, Host input, and Communication), however and idea occurred to me: should the connections which actually have activity get a short lived but dedicated thread? Once the activity is handled, the thread would terminate. This means that if one user requires a particularly high amount of time to process their request, the users behind them wont suffer for it, and at the same time, unless many users are making large requests all at once, there aren't a large number of threads running simultaneously.

That depends on what exactly processing a request requires. Whichever way you look at it, you only have X processes that can be going on at the same time, and ideally each of those is acting upon totally separate sets of data. So, would this request, or at least part of it, be something that can be handled in isolation from everything else? And would it be doing that long enough to justify the overhead of thread creation and context switching?

One common approach is to simply get the data off the wire as soon as possible, ie. read the input, put it into a queue, and let another thread (possibly just 1 for everything, possibly 1 from a pool of N reusable threads) pull it off later for processing. The idea here is that you're buffering input in memory, where you have gigabytes to play with and therefore can afford a high-latency-high-bandwidth situation, and keeping your OS and network adapter serviced, where you typically only have kilobytes of buffer space and thus can't afford high latency, or you'll overflow the buffer at best or force the system to block at worst. But in practice, probably 99% of applications will never come to any of these limits anyway.

This topic is closed to new replies.

Advertisement