Vista/Win7 loopback implementation

Started by
2 comments, last by hplus0603 12 years, 10 months ago
I know that on MacOS and some other *nixes, the loopback device implementation doesn't actually pass through the full TCP/IP stack but rather just bungs socket data straight from one side to the other, ostensibly to avoid a (wasted) copy and extra buffer space.

What I don't know is if the Windows (specifically Vista/Win7) network implementation functions the same way. I'm using both the default 127.0.0.1 loopback IP as well as a loopback hardware adapter assigned a bogus IP address from the 192.168.0.0/16 subnet.

This is important to know because it affects whether or not we can run throttle tests locally or if we need two separate machines to make sure we're actually hitting the full netcode path in the OS.


Cheers!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement
Even if you don't know the answer for sure, how about you set up a test that sends megabyte-size buffers between two sockets, run it on a multi-core machine, and measure throughput? (You should use the optimal socket I/O mechanism here, which on Windows is OVERLAPPED I/O on socket handles)
And, while you're doing that, you could also run VTune or a similar sampling profiler to see where the implementation is spending its time.
Then do the same for two different machines, and compare the numbers.

That being said: I imagine that, if you want to test under "real" conditions, you want to involve the network card driver and the actual network anyway. Even a 10 Gbps link is going to behave a lot different from memcpy(), no matter which specific software layers are involved :-) And with the cheap network cards that most systems come with these days, the actual limitations of the hardware and driver may matter more than the software, too.
enum Bool { True, False, FileNotFound };
Yeah, the whole goal here is to exercise the driver and kernel network stack paths as much as possible; we're looking for contention issues in an IOCP implementation, and if running to localhost doesn't actually hit the full network stack, we're missing a couple of potential weak points - namely drivers etc. as you mention.

For what it's worth, running across a network adapter seems to be substantially slower than running across localhost, even when the network interface is not saturated with traffic (i.e. we're CPU/HDD bound and not bandwidth bound by a log shot). So my guess is that loopback has some hackery in Windows to prevent it from hitting the full netcode path.


Oh well. To be safe we'll just stick to throwing extra machines at it :-)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]


For what it's worth, running across a network adapter seems to be substantially slower than running across localhost, even when the network interface is not saturated with traffic (i.e. we're CPU/HDD bound and not bandwidth bound by a log shot). So my guess is that loopback has some hackery in Windows to prevent it from hitting the full netcode path.


I would expect that slowdown to come from either the network adapter itself, or from the network driver. Many network drivers are terrible. Many network adapters are terrible. This is because the PC market is almost entirely "survival of the cheapest."
Try running throughput-based benchmarks against a few different programs (apache, iis, etc) on static content on the system in question, trying to saturate the network. Check out how close to full link saturation you actually get. Use VTune or a similar sampling profiler to figure out where the CPU time is spent. You can do all of this without even introducing your own software at all into the system!
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement