Archived

This topic is now archived and is closed to further replies.

Most Efficient High-Performance I/O model?

Recommended Posts

I''m developing a custom SMTP server application that''s expected to be receiving anywhere from 3000 to 5000 connections/emails per hour. My current implementation creates a new thread to handle each client connection, and utilizes overlapped I/O. What I''m wondering is if using I/O Completion ports would be a better idea. What are the pro''s and con''s to each approach?

Share on other sites
IO Completion ports are the Microsoft preferred method of handling Winsock IO for large-scale servers. The model fits well with a thread-pool and performance is the best you''ll get if it''s written well. But like any piece of code that you need to see good performance from... profile, profile, profile.

Share on other sites
One disadvantage to using I/O completion ports (which I''m ALL for for) is that they are only available on WinNT/2000/XP. Still, one thread per client is not a scalable solution under Windows. While you could create 3000 threads and service clients that way, the number of context switches and cache misses would be brutal and kill the performance of your server. I/O completion ports, under NT/2000/XP, are definitely the way to go. I/O completion ports are stupid-easy (technical term ) to use under Windows 2000/XP because of the thread-pool management functions (which use I/O completion ports internally.)

Dire Wolf
www.digitalfiends.com

Share on other sites

News to me. Could you elaborate on this a little more? I''d really appreciate it

Share on other sites
look up this function:

BindIoCompletionCallback()

That should get you started.

Share on other sites
One thing I noticed about BindIoCompletionCallback() is that the callback function is called by a non I/O thread. Perhaps I''m wrong, but doesn''t this mean you can''t post more I/O requests inside your callback? Most servers will probably get their first completion packet as a result of a call to WSARecv(), then want to post another I/O request with WSASend().

Would it be better to instead call QueueUserWorkItem() with the WT_EXECUTEINIOTHREAD flag, and then call GetOverlappedResult() in your callback to get the result of the I/O?

Matt

Share on other sites
Well, the key to squeezing out all the performance you can is keeping the IO Subsystem busy. If you queued things to different threads using APC, that means you''d be waiting for the scheduler to make a context switch to the right thread to initiate the send. With IO Completion ports, generally I''d think that you would send from any thread you want, and just let the IO threads handle your IO Completion for sends.

Receiving would be slightly different, it has to be more planned, since when you call WSARecv, you''re really just supplying a buffer to be used when it''s needed most of the time, but there could also be waiting data so your WSARecv might complete right away. You''d probably want to figure out some ideal number of receive buffers you''d like posted when you create the socket, allocate them and call WSARecv on them, then just rotate those buffers through calls to WSARecv. That way you don''t have memory allocation / thrashing issues for your buffers.

Share on other sites
What I''d like to know is whether it''s ok to post additional overlapped I/O requests from inside your callback. Jeffrey Richter wrote in an article on MSDN that you can''t, because the non-IO thread which calls your BindIoCompletionCallback routine could be terminated with your outstanding I/O still pending. The platform SDK contradicts this, stating that both I/O and non-I/O threads can initiate I/O requests, and that neither would be terminated with I/O still pending. Any clarification on the matter?

Matt

Share on other sites

quote:

From Platform SDK, August 2001 Edition
A non-I/O worker thread waits on I/O completion ports. Using non-I/O worker threads is more efficient than using I/O worker threads. Therefore, you should use non-I/O worker threads whenever possible. Both I/O and non-I/O worker threads do not exit if there are pending asynchronous I/O requests. Both types of threads can be used by work items that initiate asynchronous I/O completion requests. However, avoid posting asynchronous I/O completion requests in non-I/O worker threads if they could take a long time to complete.

This description really isn''t that clear but it essentially says, "If you post an overlapped I/O request to a non-I/O work thread, if the operation doesn''t complete in a timely manner, the thread might terminate and all pending overlapped operations will be cancelled." That is my interpretation of the information with a bit of help from Jeffrey Richter

The idea is that you receive I/O completion notifications on non-I/O threads but you queue I/O requests to I/O threads using QueueUserWorkItem(). When the work item is processed in the I/O thread, an overlapped I/O request is made against the socket/file handle. The call to the asynchronous function returns immediately and allows the I/O thread to process more work items. Since you bound the socket handle to the thread pool, when the overlapped I/O operation completes a callback is made on a non-I/O thread. Then the process repeats.

Hope this clears things up.

Regards,

Dire Wolf
www.digitalfiends.com

Share on other sites
3000-5000 connections per hour is a miniscule amount of connections. For this appliation it doesn''t matter how you handle connections...(assuming they won''t all be active at once). Even creating a new thread for each connection will work in this case. Just code it in perl for that matter. Bothering with I/O completion ports for this is a waste of time unless your traffic is going to grow a hundredfold.

Share on other sites
Well that is the point I guess. It depends on how those 5000 connections are distributed over the hour. If you are servicing 10 to 20 clients on average then it is no big deal (even though IOCP would still show a great benefit here as well.) If you are servicing 50+ clients then suddenly it can become a huge deal.

Dire Wolf
www.digitalfiends.com

Share on other sites
well, it''s choked, hard.

I can''t find any good code examples using io completion ports, and whenever I try to call BindIoCompletionCallback(), I get this:

error C2065: ''BindIoCompletionCallback'' : undeclared identifier

Share on other sites
Did you:

  #define _WIN32_WINNT 0x0500

before you included windows.h?

Share on other sites
Jon is right on. You need to declare the following definition BEFORE you include the windows.h header file.

#define _WIN32_WINNT 0x0500#include <windows.h>

You can also declare:

#define _WIN32_WINNT 0x0500#define WINVER       0x0500#include <windows.h>

Dire Wolf
www.digitalfiends.com

Share on other sites
Ok, now I''m confused...

I just read this in the MSDN docs:

quote:
A non-I/O worker thread waits on I/O completion ports. Using non-I/O worker threads is more efficient than using I/O worker threads. Therefore, you should use non-I/O worker threads whenever possible. Both I/O and non-I/O worker threads do not exit if there are pending asynchronous I/O requests. Both types of threads can be used by work items that initiate asynchronous I/O completion requests. However, avoid posting asynchronous I/O completion requests in non-I/O worker threads if they could take a long time to complete.

Share on other sites
The basic reason they suggest that is the way that the NonIO threadpool is managed i.e. thread creation / destruction. If you have too many pending IO Operations that you know will take a long time, the manager will be creating / destroying a LOT of threads and you''ll see some serious performance loss.

Share on other sites
Ok, so I have a server process that responds to input on a given socket.

That means it needs to queue a Write() while processing the Read() ( at the end of processing, most likely ).

Lemme see if I got the order of this:

1) BindIOCompletionCallback on the socket

a) set appropriate buffers and call QueueUserWorkItem to process the data and post the Write() operation to an IO thread in the thread pool
b) The Write() operation gets processed by an IO thread, and when finishes post another Read() operation.
c) The callback function bound to the Socket in step 1) above handles the Read() operation, wash-rinse-repeat.

That about right? Seems a bit roundabout. It would be a lot neater if you could just post overlapped operations in the same callback function bound in step 1)

[edited by - daerid on May 21, 2002 2:07:38 PM]

Share on other sites
quote:
a) set appropriate buffers and call QueueUserWorkItem to process the data and post the Write() operation to an IO thread in the thread pool

I''m not clear why you would need to call QueueUserWorkItem? You can initiate writes from any thread, just make sure that thread doesn''t terminate.

Share on other sites
So what, set up some kind of event and do a WaitForSingleObjectEx() ?

That seems a bit more hackish, and would also effectively eliminate that thread from the pool for the period of time that it waited, reducing your available threads to process more incoming data.

I guess the confusion is coming from the discrepancies in the MSDN documentation. In one place, it says "Go ahead and post IO operations from any thread", and then other places it says "Don''t post IO operations from non I/O worker threads".

Well, which is it?

Share on other sites
AH, I had a response to this thread a long while ago that never showed up. Basically, if you initiate an overlapped IO operation from a non-IO thread, that thread is flagged via an internal data structure as having I/O completion outstanding. So you don't have to do anything special in that thread to make sure it doesn't get terminated, the manager handles that.

Short answer: It's safe to initiate overlapped IO operations from the thread that executes your function that you passed to BindIoCompletionCallback(). (That wasn't a very short answer, I guess).

[edited by - JonStelly on May 21, 2002 2:27:37 PM]

Share on other sites
Awesome. That''s exactly what I needed to know

thx

Share on other sites
At a previous job of mine, we had an SMTP server rewritten from the ground up using IO Completion Ports. The performance gain from the previous SMTP version was astounding. From bogging at around 100 connections, to screaming at 1000+ connections.

As far as posting an IO Completion packet, instead of using the IO callback function, create a structure that has a pointer to some function you call after the OVERLAPPED struct. This way when you''re servicing threads just change the OVERLAPPED struct to your defined struct and call the callback.

i.e.,

typedef struct tagMyIOInfo
{
OVERLAPPED lpOL;
SOME_FUNC* pCallback;
void* pSomeCallbackData;
}

Understand?

This is great for File IO and Socket IO.

As far as IO Completion ports not being supported on other platforms, you can write an IO Completion routine using a semaphore and a queue. Not too tough. We did it for Win9x.

Share on other sites
So why does Jeffrey Richter suggest calling QueueUserWorkItem instead of initiating overlapped IO directly from within the BindIoCompletionCallback routine?

Does the hassle of dealing with these questionably documented APIs net you anything? It seems as though creating your own IOCP and a couple of worker threads to service it wouldn''t be that much harder.

Matt

Share on other sites
It wouldn''t. In fact I''m not sure that the thread pool management API is any easier if ALL you''re using the thread pool for is IOCP. It''s easy enough to just start an IO thread whose only job is to handle completed IO requests, but some of the things like inheriting from the OVERLAPPED structure to pass a function pointer, or a pointer to an instance of your class looks a bit strange/ugly. The single flaw in IOCP is that there''s not a ''context'' parameter already part of the overlapped structure.

Share on other sites
The thing that is 'nice' about IOCPs is that if all of your worker threads are busy or blocked on something, the OS creates another thread to handle the port. If this doesn't fix the blocking issue then you're screwed.

JonStelly -
How else do you pass data on a per instance basis to the completion port? You can associate 1 DWORD value to the port but that really doesn't help you define the context of the call when handling multiple requests.

Creating a single thread to handle IO requests is not efficient use of the processor. It will block when secondary calls come in.

Inheriting from the OVERLAPPED structure is easy to understand and an effective methodology for contextual callbacks. If you like to write huge switch statements in your threads, have fun doing it. Personally I like my thread to be short and clean, and the handling routines associated with a class.

You said it yourself, "The single flaw in IOCP is that there's not a 'context' parameter already part of the overlapped structure." Then ADD some. How can you say something is strange/ugly when you want it in there yourself?

[edited by - LordShade on May 21, 2002 7:26:13 PM]

• Forum Statistics

• Total Topics
628345
• Total Posts
2982202

• 10
• 9
• 24
• 10
• 9