Character delimited packets?

Started by
37 comments, last by ANSI2000 21 years, 5 months ago
>Very, very wrong. You can recv whenever, and you''ll get whatever has shown up. Recv isn''t supposed to block waiting for additional data to arrive. If there''s anything to return, it gets returned.

You mis-understood - it will not block and wait for the data before allowing you to recv it, most implementations (again, ''most'') will just give you nothing, unless the data is large enough to overflow whatever sort of threshold it has before it''s all arrived.

>True, but the point is that TCP does NOT present a message based protocol to higher network layers. What lower networking layers are doing isn''t our concern.

Then stop using the term packet.


>TCP does not send ''UDP chunks.'' UDP is a separate protocol. TCP in fact manages IP packets.

TCP is UDP with error checking. It transmits data the same way UDP does. A more accurate term would have been frames, but, ''UDP chunks'' sounds more interesting.
Advertisement
"You mis-understood - it will not block and wait for the data before allowing you to recv it, most implementations (again, ''most'') will just give you nothing, unless the data is large enough to overflow whatever sort of threshold it has before it''s all arrived."
Any implementation of TCP that does this should be shot. TCP is a "stream based" protocol. This means that it will return anything in it''s buffers (up to your buffer size) when you receive. It will NOT wait until it has received all your data that it knows nothing about, when you send a buffer to the TCP connection, TCP does not tag that data at all so it can put it back together at the other end. All data is just more data in the buffer to TCP.

"Then stop using the term packet."
I think that was you!

"TCP is UDP with error checking. It transmits data the same way UDP does. A more accurate term would have been frames, but, ''UDP chunks'' sounds more interesting."
As the previous poster said, TCP does not use UDP at all, but is a layer on top of IP. Yes, it does transmit data the same way UDP does....the same way ICMP does...the same way IGMP does, the same way RSVP does...etc. By sending a packet to the IP layer (well...kinda ). However, TCP and UDP have very different headers, the UDP header is much smaller ( 8 bytes), the TCP header includes a lot more information to help "guarantee" the data etc. (20 bytes) they both use the IP header as well.

Perhaps you should research the subject a bit more thoroughly before you shoot your mouth off calling facts we know to be true to be misconceptions etc.
quote:
Ok I see how that could work...

In simpler terms...

If your expecting to receive 4 bytes

Your telling recv() to move 4 bytes from the underlying stream and copy them to your buffer.

Now if recv() had actually received 12 bytes, 8 bytes would still remain in the underlying stream. So the next time a call is made to recv(), recv() will return the 8 bytes plus what ever is new. If an extra 20 bytes came in before the next call to recv(), recv() would report that there is 28 bytes received? Am I correct to say that?

I read this after my original post. Directly from MSDN...

For connection-oriented sockets (type SOCK_STREAM for example), calling recv will return as much information as is currently available—up to the size of the buffer supplied.


That''s pretty much how it works.

This might clear it up.
Say that your socket has received 100 bytes.

If you call recv(s, buf, 20) (buffer of size 20), recv will copy the first 20 bytes into buf, and return 20. With the next 80 bytes sitting in the socket.

If you call recv(s, buf, 200) (buffer of size 200), recv will copy the 100 bytes into buf and return 100. If you are expecting 200 bytes, you will have to call recv again and again until you get all the data.

As a point of reference, if you call recv and no data is waiting in the socket, it will wait until some data arrives.

quote:
Now the above mentioned code works fine for "messages" that are prefixed with a length header. But it gets a bit more complicated for "messages" that are delimited by caharacter and are variable length. This requires to read byte by bytes which total sux performance wise... That why I made the post originally. Unless any one knows a better way?


Yeah, the code works for headers, not footers, which are a bit trickier. Byte-by-byte would work, and is simple, but you''re correct in that it probably isn''t very efficient. To improve that you would have to receive in larger chunks, and extract the messages from the buffer. Of course you would have to be prepared for the cases when you don''t get a whole message, and for when you get more than one. So, it is a little trickier.

Is there any reason that you need to receive like this? Headers are a little easier to deal with, especially if you are communicating using messages.
quote:
Is there any reason that you need to receive like this? Headers are a little easier to deal with, especially if you are communicating using messages.


Have you ever had to deal with "message" protocols from the 1700s?

Hear is the situation...

The 3rd party finacial institution (bank) we are connecting to delimits their message with a cariage return and it has no length header. Now our application had no problem with receiving the "message" because it was a constant length. So we always checked for a specific length.

The problem with the bank is that, if there was a network drop, their server socket would not detect the drop, our application would detect the drop because after trying to send one "message and getting a failure the application would try to reconect. Since the server socket was still bound. The other problem is because of dinosaur architecture, they only alow one connection per port. So since theprot is still bound our application could not connect. So we would have to call them tell them to reset the prot and finally our app would reconect.

I figure they had this problem with alot of their customers, so they implemented a ping system. Now the bank send us a "message" also and we must reply for that "message". If the banks does not receive a reply it assumes the network droped and it resets itself.

The problem with that "message" is that it is a diferent length then the standard "message" the application is programmed to expect. The good thing is the "messages" use the first few bytes for "message" identification. But the only stupid problem is one message uses 2 bytes the other uses 1 byte, but it''s no biggy!


The most simplistic way to work with TCP and your own protocol is to divide the tasks. Basically, you have a memory buffer which you continually fill with data received from the network. You also have starting position and offset pointers into the buffer. This allows you to receive multiple messages in the same buffer.

When receiving data, all you should care about is how much data you receive, where to put it, and whether an error occurs. Therefore, treat received data as a stream of bytes with no structure.

The next task is to define a protocol "handler" method. This handler method is responsible for parsing the data in the buffer and managing the starting position/offset pointers. This handler gives meaning/structure to the stream of bytes in the buffer.


      // disable memory padding#pragma pack(push, 1)  struct ProtocolHeader{    int messageLength;};  // re-enable memory padding#pragma pack(pop)  const int HeaderLengthConst = sizeof(ProtocolHeader);  // Protocol member variablesParseStatusEnum parseStatus;int             currentBytes;int             bufferOffset;char            dataBuffer[20000];int             dataBufferLength;  // = sizeof(dataBuffer)   void Protocol::Process(int bytesReceived){    int bytesRemaining = (currentBytes += bytesReceived) - bufferOffset;    while(bytesRemaining > 0)    {        switch(parseStatus)        {            case ProcessingHeader:                if(bytesRemaining < HeaderLengthConst)	        {		    // we haven't received enough information                    // to process the header!                    bytesRemaining = 0;                    break;		}                                  // process your header here!                // for example:                   // cast the data in the dataBuffer to                // a header structure                MyProcotolHeader header = *((MyProtocolHeader*)(dataBuffer + bufferOffset));                   // we don't really need to do anything with the                 // header here but just for fun we'll grab the                // message length from the header                int messageLength = header.messageLength;                   // now that we have the header we can move on                // to processing the body                parseStatus = ProcessingMessage;                break;               case ProcessingMessage:                // we know we have a header in the buffer at this point                // cast the dataBuffer to a header structure                MyProtocolHandler header = *((MyProtocolHeader*)(dataBuffer + bufferOffset));                                 // grab the message length from the header                int messageLength = header.messageLength;                                    // decrement the bytes remaining                bytesRemaining -= HeaderLengthConst;		if(bytesRemaining < messageLength)		{		    // we don't have a complete message yet!                    bytesRemaining = 0;		    break;		}                  // we have a full message                // do whatever you want with it                ProcessMessage((unsigned char*)dataBuffer +                               bufferOffset + HeaderLengthConst,                               messageLength);                   // the message has been processed                // decrement our bytesRemaining                bytesRemaining -= messageLength;	        if(bytesRemaining)		{		    bufferOffset = currentBytes - bytesRemaining;		}		else		{		    // all the data in the buffer has been                     // processed.  Reset the buffer pointer to                    // the beginning of the buffer.                    bufferOffset = 0;		    currentBytes = 0;		}			parseStatus = ProcessingHeader;		break;        }    }}   void Protocol::ProcessMessage(unsigned char* data, int dataLength){    // handle specific messages here}   // driver method for this examplevoid Protocol::Recv(SOCKET socket){    // receive data into the data buffer at the current    // offset etc.    int result = recv(socket,                      (char*)dataBuffer + currentBytes,                       dataBufferLength - currentBytes,                      0);    if(result == SOCKET_ERROR)    {        // handle error - throw exception    }        if(result == 0)    {        // socket disconnected - throw exception    }        // process the received data    Process(result);}       


The code above was taken from a server that I wrote for a client a while back. I had to change some of the code to make it more clear. The real production code makes much heavier use of helper classes (the protocol handling is divided into many smaller classes - each with a specific task/role.) The header is also much more complex than the simple one I provided here. Hopefully I didn't make any mistakes when I changed the code for displaying it here.

This code illustrates one efficient technique for handling custom protocol messages received using TCP/IP.

Let me know if you have any questions.

[edited by - Digitalfiend on October 25, 2002 1:08:16 PM]

[edited by - Digitalfiend on October 25, 2002 1:11:07 PM]
[email=direwolf@digitalfiends.com]Dire Wolf[/email]
www.digitalfiends.com
Well ANSI2000, the solution to your problem might be:


      // Protocol member variablesParseStatusEnum parseStatus;int             currentBytes;int             bufferOffset;char            dataBuffer[20000];int             dataBufferLength;  // = sizeof(dataBuffer)   void Protocol::Process(int bytesReceived){    int bytesRemaining = (currentBytes += bytesReceived) - bufferOffset;    while(bytesRemaining > 0)    {        int searchLength  = currentBytes;        int messageLength = FindCarriageReturn(dataBuffer +                               bufferOffset,                                               searchLength);                 if(messageLength = -1)        {            // didn't find a carriage return            // don't have a complete message            bytesRemaining = 0;            break;        }                // we have a full message        // do whatever you want with it        ProcessMessage(dataBuffer + bufferOffset,                       messageLength);           // the message has been processed        // decrement our bytesRemaining        bytesRemaining -= messageLength;	if(bytesRemaining)	{	    bufferOffset = currentBytes - bytesRemaining;	}	else	{	    // all the data in the buffer has been             // processed.  Reset the buffer pointer to            // the beginning of the buffer.            bufferOffset = 0;	    currentBytes = 0;	}    }}   void Protocol::ProcessMessage(char* data, int dataLength){    // handle specific messages here}   // finds the carriage return and returns the message lengthint Protocol::FindCarriageReturn(char* data, int dataLength){    int  idx;    bool found = false;        for(idx = 0; (idx < dataLength) && found != true; ++idx)    {           if(data[idx] == '\r')            found = true;    }    return (found == true) ? idx : -1;}      


Not sure if this helps or not.

Regards.

[edited by - Digitalfiend on October 25, 2002 5:23:56 PM]
[email=direwolf@digitalfiends.com]Dire Wolf[/email]
www.digitalfiends.com
Thanks I will take a look into it...

But it will be alot simpler xiols way. All I have to do is read the first 1-2 bytes check the type message type... And then use a contsnat that was set for the message length and read th rest of the stream...

The bank we are connecting to use a weird system. The fields within the message are all fixed length. So the diferent types of messages will always have the same length, but the messages are character delimited.

The only draw back of looking up a constant for the length instead of really checking for the ''/r'', is that if the application needs to read in a new message, then it has to be reprogrammed to accept that message. That is not a big problem, for 1 banks take almost forever to make a change it has been like 2 years now, that we have had this application running without making changes. And for 2 even though checking for the ''/r'' is more flexible the application will still need more programming to be able to parse the new message. Then again, I can always reachitect the whole thing to use custom DLLs, that parse the different messages. But hey if it aint broke dont fix it!

Right now I can only hack it since it will cost alot lest then to reachtitect the application, to use the protocol handler and to load parser DLL depending on the message received type thing...
Yeah sorry about that. I misunderstood what you were trying to accomplish. Basically you are saying that your messages might look like:

0010 | 123456789\r0000 | 123\r000

msgid scanline endorsement

The scanline and routing fields would be fixed length but could contain character strings that occupy less than the total field length and are delimited by carriage returns.

I've done some programming for banks before - mainly on the remittance processing side and I tell you they use some pretty old technology / systems

Good luck.

BTW That server application sounds pretty awful . What kind of machine is it running on? CTOS? A-Series?

[edited by - Digitalfiend on October 25, 2002 5:18:46 PM]
[email=direwolf@digitalfiends.com]Dire Wolf[/email]
www.digitalfiends.com
I have no idea.

So far I have worked with ISO8583, Apacs and a few other standars. The current system am talking about was I guess implemented in house by that 3rd party... But its for sure at least 10 years old.

Yeah talk about also old technology, x25 lines, sending files using TCP/IP but that require dial up access over a 9600 baud connection. rediculous I tell yeah!

[edited by - ANSI2000 on October 25, 2002 5:55:10 PM]

This topic is closed to new replies.

Advertisement