Would anyone like a nice IO Completion Ports example?

Started by
19 comments, last by hplus0603 14 years, 4 months ago
I am an engineer, not a clergyman. I try to stay away from such religious topics as language and libraries, but apparently I have slipped into the trap again.

I think this thread has deviated to far from its original intention.

-Karl Strings
Advertisement
Try to search something on CodeProject.com, there are a lot of tutorials about IOCP in C++.
Quote:Original post by Windryder
Quote:Original post by Karl Strings
This has been an area of contention for me for quite some time. In most applications abstracting the OS with a library is a good thing. It lets developers concentrate more on what they are doing and less on what the underlying OS is doing. This changes when you enter the arena of system and/or server software. Modern operating systems provide lots of support for high performance applications and in order to take full advantage of that support one must intimately tie the application to the OS. You need to be just as concerned with the underlying OS and what it is doing as well as what you are doing.


Why would that be necessary when abstraction layers such as Boost.Asio which Antheus suggested simply provide wrappers around these high performance OS facilities? One may argue that the abstraction layer might be slightly slower than calling the native API directly, but knowing the Boost community I would say the performance loss is negligible.



My advice when it comes to Boost is to "Chew the meat and spit the bones." Some of the libraries are excellent but not all of them. The performance of the Boost Serialization library is significantly worse than that of the C++ Middleware Writer across a number of tests. I'm not very familiar with the Asio library, but want to point out that you should be careful regarding what you use. If you're not, your company/work will go the way of the dodo bird.

Another weakness of traditional C++ serialization libraries is they don't automate the production of serialization functions. This is a bonus that one gets with C++ Middleware Writer.

Brian Wood
www.webEbenezer.net
It's quite nice to read that someone else has been working with similar project that I've been on for a long time. However instead of just focusing on sockets, I'm writing a platform for handling streams efficiently. Just like you, I originally I worked with C, but after complexity of code increased I decided to swap to C++ to keep the code relatively trivial, clean and easy to maintain.

Even though completion ports do their job when trying to avoid obsolete memory bandwidth usage and cache trashing. From performance point of view (and to keep things trivial), it would make sense to be able to map kernel read and write buffers directly to user space instead of mapping user space buffers to kernel space (completion ports).
Quote:Original post by skorhone
... From performance point of view (and to keep things trivial), it would make sense to be able to map kernel read and write buffers directly to user space instead of mapping user space buffers to kernel space (completion ports).


Explain? I would also like to know the reson you think that mapping memory from kernel space to user space is trivial...

Just to clarify, mapping memory from one ring to another is not usually trivial. Yes, it may not be hard to "do", but it is very easy to open your code up to some wicked race conditions that are very hard to debug.

If your goal is to map memory, have fun, just be aware that you can have some weird race conditions that may not be apparent just by looking at the code.
I think the general idea is that if the device driver gets to specify the parameters for a buffer (alignment requirement, contiguity requirement, physical address requirement, etc) then it may be more efficient (due to hardware limitations) than some arbitrary buffer allocated by the application.

For example, if some DMA operation needs 64-byte aligned buffers, and a user hands in a 16-byte aligned buffer (or even a 4-byte aligned buffer from malloc()) then chances are that the driver will need to re-buffer, which triples the cost of the operation(!). Original write into internal buffer; read out of internal buffer; write into user buffer.

And before you say that all hardware should support DMA to arbitrary scattered memory pages with arbitrary alignment, let me point at the 99% of hardware out there that doesn't :-)
enum Bool { True, False, FileNotFound };
I didn't mean that the implementation of this kind of interface was trivial - it obviously isn't. What I meant was that such interface could be easier to utilize efficiently (as you already noted, most completion port implementations are more or less wrong)

Interface could be something simple as:
flags |= WSA_MEM_MAP;
buffers[0].buf = NULL; /* wsarecv updates this */
buffers[0].len = 4096; /* max read length */
WSARecv(sock, buffers, 1, &bytesRead, &flags, overlapped, NULL);
.
<read completes>
.
WSAFree(buffers, 1);

Instead of:
3* WSARecv(sock, buffers, 1, &bytesRead, &flags, overlapped, NULL);
.
<logic to process results in the threadpool in order they were initiated>

Quote:Original post by skorhone
I didn't mean that the implementation of this kind of interface was trivial - it obviously isn't. What I meant was that such interface could be easier to utilize efficiently (as you already noted, most completion port implementations are more or less wrong)

Interface could be something simple as:
flags |= WSA_MEM_MAP;
buffers[0].buf = NULL; /* wsarecv updates this */
buffers[0].len = 4096; /* max read length */
WSARecv(sock, buffers, 1, &bytesRead, &flags, overlapped, NULL);
.
<read completes>
.
WSAFree(buffers, 1);

Instead of:
3* WSARecv(sock, buffers, 1, &bytesRead, &flags, overlapped, NULL);
.
<logic to process results in the threadpool in order they were initiated>


Let me make sure I understand this... You want the allocation to happen in the kernel and WSARecv will return a kernel allocated buffer to you on completion? I am assuming that you are doing this to ensure proper memory alignment and no double allocations?

I think you will find better results and less bugs by using look aside lists. You can allocate your memory however you need to in user land at startup and free at program close. One round of allocations and frees, everything is correctly aligned however you want, and most important - no kernel code is needed.
True, but the point would be to make the application developement trivial. Completion ports have been there over a decade, still there's only handful of proper implementations and documentation is above average - there must be a reason for that :)

In my opinion interface that is hard to use, has a design flaw. Enough ranting, just trying to point out that currently completion ports are propably not the ultimate solution. With a proper interface, perhaps.
Quote:In my opinion interface that is hard to use, has a design flaw.


If an operation is, at the core, complex, then any proper interface must, by extension, be complex. If you try to put a simple interface on a complex operation, you will be doing something wrong, or something inefficient.

That being said, the wrapper idiom for I/O completion ports that's found in the .NET framework is pretty elegant in my opinion. Most well-written .NET programs use I/O completion ports, and may not even know it. Anytime you call BeginRead() or similar, you're really using I/O completion ports with the Windows thread worker pool.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement