atoi & co

Started by
18 comments, last by joe_bubamara 14 years, 9 months ago
Quote:Original post by joe_bubamara
Put your snippets into '[' code ']' and '[' /code ']' tags .... I have to add ' because of forum software, of course you use [ without '. Code tag will preserve tabs, spaces and other things. If you wish to have colorized syntax read the forum faq to see how it is used - I dislike that text area with colorized widget is so small so one sees only few lines of code at a time, so I almost never use it.


I tried to put the code into code/source formatters, but it had those issues. On second thought it also could have been the reply-preview that is buggy in this regard ...
I'll just try it again, regardless of how messed up the preview will look like
(it's not surprising that it's not very unlike the code the guy after my last post added to this thread).

inline bool StrToI32(const char * str, int32_t * number){    *number = 0; // just in case it wasn't initialized    while (*str == ' ') ++str;    int32_t sign = 1;    char character;    switch (*str)    {        case '-':            sign = -1;            // fall through        case '+':            ++str;            // fall through        default:            character = *str;            if ((uint32_t) (character - '0') <= 9U) // is character a number ?            {                int32_t lastvalue = 0, value = 0;                do                {                    value *= 10;                    value += character - '0';                    if (lastvalue > value)                    {                        return false; // overflow                    }                    lastvalue = value;                    character = *(++str);                }                while ((uint32_t) (character - '0') <= 9U);                *number = value * sign;                return true;            }            break;    }    return false; // not found}


Quote:Original post by joe_bubamara
Yeah it is good question, where we should stop rolling own things instead of using libraries? :-). I believe that those conversion functions are done in optimized assembly, so that was reason why I am trying to use those rather then own thing.

cheers


It's probably optimized, but also you have to account for the function call itself, which normally will add about 60-100 cycles by itself, while a hand-rolled code could have the chance to be inlined (or forced to be) and integrate well into the code (with an additional advantage that registers can be reused and don't need to be push/popped from the stack each function call) :-)
Best to profile and see for yourself if it all makes sense in your particular piece of code.

Advertisement
Quote:Original post by plastiqueI tried to put the code into code/source formatters, but it had those issues. On second thought it also could have been the reply-preview that is buggy in this regard ...


Yes, the previewer is buggy. Your pluses are retained when you actually post it.

I see your point with inlined code, I do agree with you, but I don't think it is of much use for me in this project. Imagine how a .obj file looks like - you would have to inline all this into many places - several places for almost any command parsed, since most commands take several parameters. Also note that 99% of numbers in .obj files are actually floats; and parsing floats is more jobb then parsing integers, so I believe it would add signifikant to binary. But it might come to use in some other project.

ps. your code needs few more checks for valid pointer; but I guess you wrote it just to illustrate the principle.
Quote:Original post by joe_bubamara
I see your point with inlined code, I do agree with you, but I don't think it is of much use for me in this project. Imagine how a .obj file looks like - you would have to inline all this into many places - several places for almost any command parsed, since most commands take several parameters. Also note that 99% of numbers in .obj files are actually floats; and parsing floats is more jobb then parsing integers, so I believe it would add signifikant to binary. But it might come to use in some other project.


Makes sense. A static function might be better in this regard.

Quote:Original post by joe_bubamara
ps. your code needs few more checks for valid pointer; but I guess you wrote it just to illustrate the principle.


What do you mean ? Nope, this is working code. The only check that might be missing is to check if 'str' isn't null, but that's a bit like questioning the intelligence of the user of this function :P

For careful parsing in C, don't use atoi or atof. As you've noted, if they don't like what they see, they just return 0, and if they do like what they see, you don't know how many characters they consumed.

But sscanf can be made to tell you exactly how many characters were consumed (%n), and can tell you how many fields were converted (return value), so with some care you can do a reasonably careful parser with it.

Or, as noted, you can write your own atoi(); that's easy. Writing your own atof() is not recommended, though, unless accuracy is not really an issue.
Quote:What do you mean ? Nope, this is working code. The only check that might be missing is to check if 'str' isn't null, but that's a bit like questioning the intelligence of the user of this function :P
ehh ... yepp .. and that is why you should always check for null :-), sometimes users are very stupid :-).

well seriously, source you through at it might come out from other parts of code where other errors could occured.

Every time you increment pointer you should check if it found '\0', unless you are perfectly sure how many valid characters are left in buffer. This is just my experience, I might be too carefull here.

As another point I went through great pain to eliminate any call to strtok function, exactly because of static memory it uses :-). Static functions are definitely forbidden; they should be thrown in deepest hole in the Earth and burried unless C/C++ standard specifies variables declared as static to live on thread stack space rather then global process space :-(.

I didn't know for %n option to scanf; however I go twice through every line and check number of tokens avialable; then I get second time and try to convert those tokens with scanf.

While we are talking about parsing and scanning, would you guys mind to take a look at little help library I wrote while writing this parser? I am done with .obj parser which parses all commands as specified in version 3.0 of parser (found on P. bourke's site) and will wait to release it until I finish .mtl. Meanwhile I got a bit bored and did this little abstraction layer for readin and tokenizing ascii files, which you can find on http://www.nextpoint.se. Check libToken project.

I don't have so much time for testing so maybe you can take a peek and give one or another thaught and opinion? It would be great ... sorry if I am using you too much here :-)
Quote:Original post by joe_bubamara
Every time you increment pointer you should check if it found '\0', unless you are perfectly sure how many valid characters are left in buffer. This is just my experience, I might be too carefull here.


If you go through the code I posted above you'll notice that only valid characters are checked (+, -, digits, whitespace), on all other characters the function will terminate (including the null terminator), that's why such additional checks are not necessary here.

Quote:Original post by joe_bubamara
As another point I went through great pain to eliminate any call to strtok function, exactly because of static memory it uses :-). Static functions are definitely forbidden; they should be thrown in deepest hole in the Earth and burried unless C/C++ standard specifies variables declared as static to live on thread stack space rather then global process space :-(.


Don't confuse static functions with functions that make use of static data (like strtok, which is indeed not recommended as it also modifies the buffer you are tokenizing). The 'static' as how I meant it merely means the visibility of the function (= only visible in the current source file).
Quote:Don't confuse static functions with functions that make use of static data (like strtok, which is indeed not recommended as it also modifies the buffer you are tokenizing). The 'static' as how I meant it merely means the visibility of the function (= only visible in the current source file).


I can buy that; I was a bit fast there on word "static" :-).

I haven't compiled and run your code, I will do it later; but on glimpse it seems like you will do all "default" part on null character; also even if it + or - are followed by null character. Since '\0' == 0, you will get negative number which is <= 9u and you will get into the if statemnet.

As said didn't run through your code in debugger, will do it later.
I just got an article on my little tokenizing library on the code proj :-) ... and I found several typos already hehe.

It would be nice if you can cast a look and give some opinion ...

http://www.codeproject.com/KB/cpp/libToken.aspx?fid=1544313
Quote:Original post by joe_bubamara
I haven't compiled and run your code, I will do it later; but on glimpse it seems like you will do all "default" part on null character; also even if it + or - are followed by null character. Since '\0' == 0, you will get negative number which is <= 9u and you will get into the if statemnet.


Admittedly, this comparison is a bit tricky since it uses the overflow behaviour of an integer (negative integer converted to unsigned integer will overflow to a rather high value), but it does the job in its efficient way (which is perfectly valid according to the C language specs as well of course) .

Just as an example:

int32_t si = -1;
uint32_t ui = (uint32_t) si;

As a result of the overflow the value of 'ui' is 0xFFFFFFFF now, the biggest possible uint32_t value (-2 would overflow to 0xFFFFFFFE, and so on ...).

Taken my example code,

(uint32_t) (character - '0') <= 9U

if the character is '\0', the result inside the brackets will be negative (-48 to be exact), but since it's casted to an unsigned integer it will overflow and that's why the comparison won't match here.

Hope this makes it more clear.

This topic is closed to new replies.

Advertisement