• Advertisement
Sign in to follow this  

Detecting in string numbers higher than max possible Sint32

This topic is 4513 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Uhhhm, hi guys, stupid problem. I have std::string which contains a number, which may have any possible value (ie. 0, 128 or 9999999999999). Now, I'd like to know whether I can safely convert that string to 32 bit signed integer (Sint32 as in SDL) and store it in it. I'd like to use atoi() for conversion. If that number lies within [-2147483648, 2147483647], then we know that such operation is safe etc. nothing will go wrong. However, if that number is higher, then doing normal atoi conversion in such a way...

Sint32 number = atoi( my_string.c_str() );

 if ( (number > 2147483647) || ( number < -2147483648) )
  return // couldn't convert


... for obvious reasons won't work :-) I think, that I could loop over all chars from that string in reverse direction and try to manually construct such number and bail if adding another part would create risk for overflow, ie.

Sint32 my_number = 0;
for (int i = my_string.size(); i > 0; --i)
 {
  Sint32 temp = ((my_string.size() - i) * 10 * (my_string - '9'));
  if (my_number + temp > 2147483647 || my_number + temp < -2147483648)
   return ERROR;
  my_number += temp; 

 }


Ekhm, at this point I realized that even doing simple "my_number + temp" may result in overflow [crying] rendering this algorithm useless. And no, I don't want to use any external big numbers libraries, nor code my own version. Any thoughts?

Share this post


Link to post
Share on other sites
Advertisement
#include <iostream>
#include <sstream>
#include <string>

void parseInt(std::string num)
{
std::istringstream sstr(num);
int integer;
if (sstr >> integer)
{
std::cout << "parsed integer " << integer << " successfully\n";
}
else
{
std::cout << "invalid integer: " << num << '\n';
}
}

int main()
{
parseInt("123456789123456789");
parseInt("2147483648");
parseInt("2147483647");
parseInt("-2147483648");
parseInt("-2147483649");
parseInt("-123456789123456789");
}

Or you could use boost::lexical_cast or the C99 function strtol.

Interestingly enough this code demonstrated a flaw in the SC++L implementation with my Borland compiler, which incorrectly parsed "2147483648" as -2147483648. gcc and MSVC++ both worked fine though.

Enigma

Share this post


Link to post
Share on other sites
Hmmmmm, once again I forgot about STL [rolleyes].

Though it's not the possibly best solution (I'm going to use it in SDL_Config library, which should include as little headers as possible for optimal use), it's very close to it, and I'm going to use it. Enigma, IIRC I rated you max up few months before, so unfortunately I can't do it now... anyway, thanks for the help :-)

Share this post


Link to post
Share on other sites
Errrm, I was thinking about more sophisticated solutions, and as I see know, this completely undermined my ability to create simple solutions. Probably you are right, tomorrow (now it's too late) I'm going to check whether atof() will help me. Thanks :-)

Share this post


Link to post
Share on other sites
Rip-off - you were right, atof trick works:


#include <string>
#include <iostream>

using namespace std;


void parseInt(string num)
{
double integer = atof(num.c_str());

if ( (integer <= (double) 2147483647.0) && (integer >= (double) -2147483648.0f))
{
std::cout << "parsed integer " << num << " successfully\n";
}
else
{
std::cout << "invalid integer: " << num << '\n';
}
}

int main()
{
parseInt("123456789123456789");
parseInt("-123456789123456789");
parseInt("2147483648");
parseInt("2147483647");
parseInt("-2147483648");
parseInt("-2147483649");
parseInt("123456789123456789123456789123456789123456789123456");

}




Output:



invalid integer: 123456789123456789
invalid integer: -123456789123456789
invalid integer: 2147483648
parsed integer 2147483647 successfully
parsed integer -2147483648 successfully
invalid integer: -2147483649
invalid integer: 123456789123456789123456789123456789123456789123456




Thanks :-)

Share this post


Link to post
Share on other sites
Quote:
Original post by Enigma
or the C99 function strtol.
Both strtol and strtoul are actually C89 functions. So you can use them freely in both C and C++.

The standard library doesn't provide any way to do this with integers however so you'll have to clip them yourself, or rely on the fact that sizeof(long) = sizeof(int) on most platforms.
int strtoi(const char *p) {
long result = strtol(p, NULL, 10);
if(result < INT_MIN) { return INT_MIN; }
if(result > INT_MAX) { return INT_MAX; }
return (int) result;
}

Share this post


Link to post
Share on other sites
Hey, thanks, but this problem is already fixed :-)


Btw, I don't need to know the exact value of that integer, only whether it fits in Sint32.

Share this post


Link to post
Share on other sites
Be aware that using atof only "works" in this case because you are testing for a number which is a power of two or one less that a power of two. Generally double-precision floating point numbers do not have enough precision to guarantee correct results. Try testing for a range of +-2000000000 to see what I mean. Furthermore that "works" is in inverted commas for a reason. In the event that the input number exceeds the valid range for a double-precision floating point number the result is technically undefined, even though all my implementations actually return +/-infinity.

In general there is no reason to prefer atoi and atof over C++ streams except for the very limited case where you can guarantee the range of input and need to perform string to integer/floating point values frequently in a tight loop and a profiler has demonstrated the conversion to be a bottleneck.

Enigma

Share this post


Link to post
Share on other sites
Quote:

Be aware that using atof only "works" in this case because you are testing for a number which is a power of two or one less that a power of two. Generally double-precision floating point numbers do not have enough precision to guarantee correct results. Try testing for a range of +-2000000000 to see what I mean. Furthermore that "works" is in inverted commas for a reason. In the event that the input number exceeds the valid range for a double-precision floating point number the result is technically undefined, even though all my implementations actually return +/-infinity.


Hmmm, I executed this:



#include <string>
#include <iostream>
#include <sstream>

using namespace std;

typedef signed int Sint32;

string temp;
stringstream sstr;

void checkint(Sint32 num)
{
sstr.clear();
sstr << num;
temp.clear();
sstr >> temp;

double integer = atof(temp.c_str());

if ( !( (integer <= (double) 2147483647.0) && (integer >= (double) -2147483648.0f)) )
{
cout << "! Error ! Parsed integer " << num << " and got in result " << temp <<
" which appears to be outside range\n";
}

if (num % 1000)
cout << num << endl;


}

int main()
{
for (Sint32 i = -7483648; i < 7483647; i += 10)
checkint(i);
}




No errors occured, also when I changed
for (Sint32 i = -7483648; i < 7483647; i += 10)

to

for (Sint32 i = -2000000000; i < 2000000000; i += 1000)
// btw, it produced 40 MB txt file :-)

Also, I've tried going over whole range -2147483648 to 2147483647 by 1... but after few minutes, when history of cache use skyrocketed to the point not seen even when playing Doom 3 on my old computer I had to kill it. But even then, the re were no errors in txt file.

------

Uhm, this works too:


void checkint(Sint32 num)
{
sstr.clear();
sstr << num;
temp.clear();
sstr >> temp;

double integer = atof(temp.c_str());

if ( !( (integer <= (double) 2147483647.0) && (integer >= (double) -2147483648.0f)) )
cout << "! Error ! Parsed integer " << num << " and got in result " << temp <<
" which appears to be outside range\n";


if (num != static_cast<Sint32> (integer))
cout << "! Error ! Parsed integer " << num << " and got in result " << static_cast<Sint32> (integer) <<
" which appears to be different!\n";
}

int main()
{
for (Sint32 i = -2147483648; i < 2147483647; ++i)
checkint(i);
}



However, I couldn't wait until it finishes, so after few minutes once again I had to kill it. But there were no errors reported...


Btw, when my MinGW sees -2147483648, it complains about "[Warning] this decimal constant is unsigned only in ISO C90 " :-?

Quote:

In general there is no reason to prefer atoi and atof over C++ streams except for the very limited case where you can guarantee the range of input and need to perform string to integer/floating point values frequently in a tight loop and a profiler has demonstrated the conversion to be a bottleneck


For me, there is one, little, not-very-important, albeit existing reason: need to include <sstream>, which may slightly increase produced lib. Ok ok, I know that's going to add max. only few kb. Anyway, I've already implemented atof, and I'm going to change it only if it will cause any errors,

Btw, now I've also remembered that atof will be called only for strings which have their size not bigger than 10 (or 11 if minust stands first). So in the extreme case I'll get 9999999999 - AFAIK atof should work fine with such things.

Share this post


Link to post
Share on other sites
What about converting the max and min numbers in a string and comparing them against your numbers?

Share this post


Link to post
Share on other sites
The atof() trick will actually work provided you're on a 32-bit system because the mantissa of a double is more than 32 bits. On 64-bit systems though it will break down with numbers close to the limits because you'll lose 12 bits of precision (i.e. +/- 4096).

And doynax, this will never work:

int strtoi(const char *p) {

long result = strtol(p, NULL, 10);

if(result < INT_MIN) { return INT_MIN; }

if(result > INT_MAX) { return INT_MAX; }

return (int) result;

}

Because both of those tests are always false if sizeof(int)==sizeof(long).

Share this post


Link to post
Share on other sites
Quote:
Original post by ZQJ
And doynax, this will never work:

int strtoi(const char *p) {

long result = strtol(p, NULL, 10);

if(result < INT_MIN) { return INT_MIN; }

if(result > INT_MAX) { return INT_MAX; }

return (int) result;

}

Because both of those tests are always false if sizeof(int)==sizeof(long).
Ahh.. But you're forgetting about the built-in safeguards in strtol which clips the numbers within the range of a signed long, that code was only supposed to make sure it fit within an integer too.
If it hadn't done that then it wouldn't even have worked properly with "large" longs since integers even above the size of a long could've been truncated down to look like an integer.

Share this post


Link to post
Share on other sites
Quote:

The atof() trick will actually work provided you're on a 32-bit system because the mantissa of a double is more than 32 bits. On 64-bit systems though it will break down with numbers close to the limits because you'll lose 12 bits of precision (i.e. +/- 4096).


I've done some tests and it looks like this trick indeed, is working on my 32 bit system. However, I don't want to loose portability for few kbs more in library, so I'm going to replace atof with stringstream (for third time already... [rolleyes]).

Once again thx everyone.


Quote:

What about converting the max and min numbers in a string and comparing them against your numbers?


Ok, but you can compare two strings and tell only whether they are different/equal. So such thing: if (my_string < "123456") will compile (because of std::string operator<) but it won't work the way you want :-)

Share this post


Link to post
Share on other sites
Quote:
Original post by Koshmaar
Hmmm, I executed this:

*** Source Snippet Removed ***

Sorry, I wasn't clear enough. I meant if you do (also meant 1999999999 rather than 2000000000):
#include <string>
#include <iostream>
#include <limits>

using namespace std;


void parseInt(string num)
{
double integer = atof(num.c_str());

if ( (integer <= (double)1999999999.0) && (integer >= (double)-1999999999.0f))
{
std::cout << "parsed integer " << num << " successfully\n";
}
else
{
std::cout << "invalid integer: " << num << '\n';
}
}

int main()
{
parseInt("1999999999");
parseInt("2000000000");
parseInt("2000000001");
parseInt("-1999999999");
parseInt("-2000000000");
parseInt("-2000000001");
}

However, what I hadn't noticed was the subtle error in the code you had posted. The second test is actually against a float cast to a double, rather than a pure double and it is this loss of precision which caused the error. doubles do have sufficient precision to work for the range of 32bit integers (although of course the problem would reappear with 64bit integers).

Quote:
For me, there is one, little, not-very-important, albeit existing reason: need to include <sstream>, which may slightly increase produced lib. Ok ok, I know that's going to add max. only few kb. Anyway, I've already implemented atof, and I'm going to change it only if it will cause any errors,

Fair enough, I just wanted to point out the potential dangers incase you decided to use this technique more generally or for anyone else reading this thread.

EDIT: Too many replies while I was writing this.

Enigma

Share this post


Link to post
Share on other sites
Quote:

Ok, but you can compare two strings and tell only whether they are different/equal. So such thing: if (my_string < "123456") will compile (because of std::string operator<) but it won't work the way you want :-)

I was thinking to something less trivial, like a char to char comparison. Since they are ints, just comparing the lenght should be ok.
Another way one could try is by splitting the number in bytes and doing a check on them separately. I suppose that this might be not portable due to endianness, but should work.

EDIT: corrected the endianness typo. Thanks ZQJ

[Edited by - cignox1 on October 12, 2005 4:56:06 AM]

Share this post


Link to post
Share on other sites
I don't know if that was a typo, but it's endianness not Indianness.

Anyway a string-to-string compare would work something like so (assuming no leading zeros):


bool is_in_int_range (const string& my_string)
{
string test_string, limit_string;

if (my_string[0] == '-') {
test_string.assign(my_string, 1, my_string.size() - 1);
limit_string = "2147483647";
} else {
test_string = my_string;
limit_string = "2147483648";
}

if (test_string.size() > limit_string.size())
return false; /* Doesn't fit */
else if (test_string.size() < limit_string.size())
return true; /* Does fit */

for (int i = 0; i < test_string.size(); ++i) {
if (test_string > limit_string)
return false;
else if (test_string < limit_string)
return true;
}

return true; /* Is exactly on one of the limits */
}


Share this post


Link to post
Share on other sites
Quote:
Original post by Enigma
The second test is actually against a float cast to a double, rather than a pure double and it is this loss of precision which caused the error.

-2147483648 fits perfectly into a float because it only has 1 significant bit of precision. 2147483647 however has 32 significant bits of precision and therefore cannot be represented as a float.


float f =-2147483648.0f;
float g = 2147483647.0f;
writefln("%f", f);
writefln("%f", g);



This piece of code will output:

-2147483648.000000
2147483648.000000


Quote:
Original post by Enigma
doubles do have sufficient precision to work for the range of 32bit integers (although of course the problem would reappear with 64bit integers).

Problems would already kick in above 53 bits of precision.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement