Parsing Chars

Started by
5 comments, last by iMalc 16 years, 6 months ago
Hi guys, I want to parse char arrays (in C/C++) that don't include NULL terminators (it's incoming socket data.) I was thinking I can make a new array one char bigger, copy the original array, then add the NULL terminator myself, this way I can use the standard string manipulation routines. Either that or run through the arrays with a for loop and do the searching myself. I don't like the first way because of the memcpy(), if I do it myself with the loop will it be slower than using the string routines ? Maybe I can do something else ? Also, I don't want to use any existing libraries or templates, I'd like to code it myself. Thanks.
Advertisement
The obvious solution would be to make your initial array larger and then stick the null terminator there.
Yeah, it is obvious.

I'm still wondering what the difference (in speed) would be if I searched the arrays manually, with for loops.

I'd setup a performance counter and run some tests but I plan on porting my code to FreeBSD so the Win32 tests would do me little good.

What do you think ?
Quote:Original post by Endemoniada
Yeah, it is obvious.

I'm still wondering what the difference (in speed) would be if I searched the arrays manually, with for loops.

I'd setup a performance counter and run some tests but I plan on porting my code to FreeBSD so the Win32 tests would do me little good.

What do you think ?


Searched for what? Are you talking about parsing the data?

If so, just use the standard search routines. The Standard C++ Libraries algorithms can be used on raw arrays easily, and can be completely inlined by the compiler should it make that decision.

Or you could use std::string and its search member functions. It would involve making a copy of the data, but could be handier.

It depends on what kind of application you are writing.

Remember you cannot expect TCP/IP to obey the send() boundaries when recv()ing. In other words if you send "Hello, World", then you could end up recving in a few chunks, or with extra data after from subsequent send calls. You may end up having to make copies anyway.

On a final note: "I don't want to use any existing libraries or templates" is inconsistent with speed concerns. If you want it to be fast, use the things experts have written for you. If you want to learn stuff yourself, that is fine as long as you are aware you are exceedingly unlikely to beat the people who wrote not only your compiler but also your standard library. std::string has the potential to be a lot faster than using C style NUL terminated strings, because the length of the string is known.
Thanks rip-off, that helps a lot.

Yes, I mean parsing the data myself, something like this simple example:

for(i=0;i<len;i++)
if(buffer == '<')
begin=i;
if(buffer == '>')
end=i;
}

I rather do it that way but if it's slower than the C++ Standard functions I should just use those.

On another note, I know I can receive chunks or data but I didn't know they can be appended. Take a look at this example:

// client
send("<Hello Everyone>");
send("<Let's play>");

...does that mean on the server end this can happen ? :

// server
recv(buffer); (on, say, the second call)
buffer could be: "Everyone><Let" ?

I didn't know that could happen. Thanks a lot.
Stop.

What are you really trying to do?

And are you using C, or are you using C++? They are fundamentally different languages that simply share a common history.

Also, it is not possible to do this without "us[ing] any existing libraries or templates". The "standard string manipulation routines" are a library. So is the code that provides you with "incoming socket data". Please, eliminate your bias against other people's code. (After all, you didn't make the compiler, either.)
Quote:Original post by Endemoniada
I want to parse char arrays (in C/C++) that don't include NULL terminators (it's incoming socket data.)
It better be a fixed-length or length-prefixed string then, or you wont be able to determine the length at the other end.
Quote:...does that mean on the server end this can happen ? :

// server
recv(buffer); (on, say, the second call)
buffer could be: "Everyone><Let" ?
Essentially, yes that can and often will occur.
Think of TCP as some kind of two-way stream. You just shove bytes in one end and they appear out the other end. Both ends will quite happily buffer the data as it sees fit, and it places no kind of markers between your send calls for you.
You either wait for a response from the other end between sending packets (assuming you send a response for each packet) - slow! Or, you encapsulate the data in higher-level packets, such as length, followed by data.
On the other end, you then receive the length, and then receive more bytes until you have at least the reuqired number of bytes. If you get more bytes than needed, then you have to store the extra somewhere and take from there first next time. You've got to deal with the possibility of initially receiving only 1 byte of say a two-byte packet length as well. It's certainly no walk in the park...
However, if every packet is simply a string, then just send the null-terminator and look for that on the other end to seperate the strings.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement