Server multicore issue

Started by
25 comments, last by hplus0603 13 years, 7 months ago
I am seeing a weird issue while testing my server app. It seems windows 7 is only putting me on one of the 4 cores. I am connecting the server to itself and causing a cascading count of messages to loop-back to the server till it hits an equilibrium to see what my max throughput is but I cant get the cpu usage over 40% overall. I have one core at 90% but every other core is less than 30%. I have 13 threads running so it doesn't make any sense that I cant wall the box. Has anyone any idea why this is occurring? I even tried setting the thread priority to critical.
Advertisement
Some kind of resource lock problem is likely preventing your worker threads from operating at max CPU capacity. If they query into your main thread (thus locking a mutex or something) then it is normal to expect that the worker threads are in competition with each other.

It's possible that you're are doing intensive work while you hold a lock. or all your worker threads hit a barrier while they wait for the main thread to get to the end of the frame or something. Really it could be anything.

Describe your thread architecture. It'd also be useful to profile your threads to see where they're waiting.

-me
I have 2 ways of running it at the moment. 1 is a send thread, a recv thread, and a heartbeat thread. 2 is a send thread, a recv thread, a heartbeat thread, and 4 queue threads. I have one set of send,recv, and queues (if used) for the server side and the client side. As a note if I use opt 1 I get twice the performance as I do for opt 2. also as a rule I lock only when needed (ie copy the list/unlock process the list). I am also using non blocking tcp sockets with ioctl to test for data to receive before calling recv.
Which language? Which threading (API/library)?
c++ windows api
the network and threading code is my own, not 3rd party. I did find the issue in the thread pooling though so opt 1 and opt 2 perform alike. Currently I am getting 30k - 40k messages per second with 100 connections. processor is still at 59% though with 2 cores busy and 2 semi busy.
Quote:Original post by david_watt78

I did find the issue in the thread pooling though so opt 1 and opt 2 perform alike. Currently I am getting 30k - 40k messages per second with 100 connections.


Is this a test over localhost or over LAN (how fast)?
same box server connects to itself and sends itself messages then echos back to itself 3 copies of each message. Its designed to test the pure maximum throughput of the system. The messages/sec will rise till a balance of send and recv happens.
I suppose some profiler or debugger might be able to hint into right direction.

Failing that - why not start with boost::asio and work from there. It scales to gigabit networks with almost no effort, takes care of all the threading, and apparently even builds much faster in recent versions.

It's just that "my threads don't scale" is such a broad question that can be caused by literally anything or a combination of everything.

Quote:I have 13 threads running so it doesn't make any sense that I cant wall the box

A 1000 threads don't wall the box. But on a quad core, 4 threads with enough work will. Threads don't magically make things scale up.

The biggest flaw from minimal information available is that there are 3 threads (send/receive/heartbeat) contesting for each socket. Usually one would want to have each socket managed by one and one thread (from a pool) only at any given time. And often this is combined with per-socket processing. It drastically reduces the need for locking or might not even require any (tricky though).
With Windows, what OS version you're using may matter. I suggest benchmarking on Windows Server 2008, which probably has more server-ish featured enabled. For example, I wouldn't be surprised if Windows Ultimate only serviced interrupts on a single core, whereas Windows Server distributed interrupts across all cores. Just speculating here, but that's the kind of semi-artificial difference you'll find between the Workstation and Server variants.

Also, what network card you have matters; sometimes a lot. If your network card is a "connectivity solution," with drivers written by high school drop-outs who cut and paste lines of code until it magically passes WHQL, then it's unlikely to contribute to good overall system performance.

Sadly, I have no idea which network cards have better vs worse support on Windows, but I'd imagine the higher-end Intel gear would be pretty good, and anything Realtek based would probably be more towards the "connectivity solution" end.
enum Bool { True, False, FileNotFound };
I have isolated the issue to the following by profiling. Even with non-blocking sockets send and recv threads are taking 15 mills to run 10 connections for some reason. Wierd part is its exactly 15 mills in both cases which is odd.

This topic is closed to new replies.

Advertisement