Sign in to follow this  

using signed int for values that should never be negative

This topic is 3743 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The game engine my company has licensed does this all the time - it will use signed math for a simple for loop. It seems to me that this is bad practice and can, in some circumstances, result in severe security vulnerabilities. Should I always be using unsigned types for types that are never negative, or am I overreacting and adding complexity where it is completely unnecessary?

Share this post


Link to post
Share on other sites
I wouldn't say that using signed integers for values that will never be negative is bad practice as such, it's certainly quite common practice and you can use -1 to represent an uninitialised state which can be handy.

I personally however do use unsigned ints quite extensively, like: for (unsigned int i = 0; i < something; ++i)

If I see a signed integer I don't automatically assume it'll be used for negative values but the possibility exists, if I see an unsigned integer then I know it will always be positive, being explicit about your intentions never hurts.

Ofcourse, signed integers usually have a lower maximum (positive) range than an unsigned integer, so sometimes the decision is effectively made for you based on what you need.

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
I have one question for you to consider. When is 4294967295 preferable to -1 when it comes to invalid values?


I think he might be asking this:

for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be negative in a sane application
};




Personally, I try to match signedness with context.

Using unsigned types isn't always just gravy, and I had to deal with a very nasty side-effect of it.

The loop as simple:

size_t MAX = (a - b);
size_t i = 0;

while ( i < MAX ) // do something



But... due to various circumstances, MAX was initialized to -1, and hilarity ensued. If signed types were allowed, negative values would properly terminate the loop.

So it's not a sure-fire way to solve or prevent bugs.


There was also another conflict I encounter.

While designing a Buffer class, I chose STL's types. Size was size_t, since buffer size is either 0 or positive number.

But then I used pointer difference to obtain used size (curr_ptr - begin_ptr), which is typed as ptrdiff_t. Obviously, ptrdiff_t is signed, although.

Here's where compiler was hinting at something - even if I design a class correctly, and make sure it works, the language no longer guarantees that pointers are truly properly initialized.

So if compiler is warning you about sign conversions, it's usually a sign you're avoiding something, or trying to make something messy. In my case, it was perfectly warranted, since I was deliberately mixing incompatible types, even though I was trying to be smart about it.

Use whatever you see fit. But make sure you take into consideration the warnings compiler might start issuing over your choices. They'll usually expose very subtle situations.

Share this post


Link to post
Share on other sites
This is a personal view and obviously not widespread in use (as judged from the posts here) but I always try to use the sign according to context. So if a loop should not be negative then unsigned will be used; if a generated random value should be positive then I'll use unsigned, etc (actually this issue of signed and unsigned becomes absolutely critical within some pseudo-random number generator algorithms). I feel it is just good practice to be aware of the allowable or expected limits of all parts of the code no matter how small, and to code accordingly.

--random

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be 4 billion in a sane application either.
};


er, why would i be 4 billion, and why would having i be -1 make a difference?

Share this post


Link to post
Share on other sites
Quote:
Original post by random_thinker
This is a personal view and obviously not widespread in use (as judged from the posts here) but I always try to use the sign according to context. So if a loop should not be negative then unsigned will be used; if a generated random value should be positive then I'll use unsigned, etc (actually this issue of signed and unsigned becomes absolutely critical within some pseudo-random number generator algorithms). I feel it is just good practice to be aware of the allowable or expected limits of all parts of the code no matter how small, and to code accordingly.

--random

I think to some extent this can be bad - for example, using a char to represent an ascii value is fine, but chances are it just means 3 bytes are wasted because everything is int-aligned.

Share this post


Link to post
Share on other sites
using signed in a for loop is totally normal. Not a "security risk". I once tried to do this but then you end up using the value to pass into a function which expects signed int, and end up with a nasty mess of casts.

int is the simple and the default thing to use, whenever you don't have extremely large values to deal with (or bitmasks).

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
Quote:
Original post by Sneftel
for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be 4 billion in a sane application either.
};


er, why would i be 4 billion, and why would having i be -1 make a difference?

The point is that unsigned values cause bugs and vulnerabilities in exactly the same places as signed values. The computer doesn't do the range checking for you either way.

Share this post


Link to post
Share on other sites
One thing to be aware of when using unsigned integers is that the result of an arithmetic operation involving a signed and an unsigned integer is unsigned. So if you have a signed integer, and you subtract an unsigned integer from it, if the result would be negative, you will get wrapping.

#include <iostream>
int main()
{
int i = 100;
unsigned j = 200;
float k = i - j;
std::cout << k << std::endl;
}

Although I use unsigned integers to enforce the constraint that a value will never be negative, when you're doing a lot of arithmetic with a mixture of signed and unsigned integers, it can make sense to make them signed and avoid subtle bugs caused by implicit conversions.

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
Quote:
Original post by Sneftel
for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be 4 billion in a sane application either.
};


er, why would i be 4 billion, and why would having i be -1 make a difference?

Because in the unsigned world, -1 is equal to 4294967295.

So if you had some strange code that set i to -1, it would crash whether you used a signed or unsigned int.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sc4Freak
Quote:
Original post by thedustbustr
Quote:
Original post by Sneftel
for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be 4 billion in a sane application either.
};


er, why would i be 4 billion, and why would having i be -1 make a difference?

Because in the unsigned world, -1 is equal to 4294967295.

So if you had some strange code that set i to -1, it would crash whether you used a signed or unsigned int.
I would prefer unsigned in such a case. an index of 4294967295 will obviously always crash with 32-bit addressing, whereas an index of -1 might simply access a different variable on the stack/heap.

The only place where using unsigned trips you up (that I can remember) is when doing a decreasing loop that has an ending condition of i>=0, which becomes an infinite loop.

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
The game engine my company has licensed does this all the time - it will use signed math for a simple for loop. It seems to me that this is bad practice and can, in some circumstances, result in severe security vulnerabilities.

Should I always be using unsigned types for types that are never negative, or am I overreacting and adding complexity where it is completely unnecessary?


Sticking to the original topic of a "simple for loop", why type out anything more than 3 letters?

I *always* use a simple int for "simple for loops".


// spawn a bunch of random objects
for (int i = 0; i < 50; ++i)
spawnRandomObject();


Not really sure how I can be more "safe".

Share this post


Link to post
Share on other sites
Quote:
Original post by iMalc
Quote:
Original post by Sc4Freak
Quote:
Original post by thedustbustr
Quote:
Original post by Sneftel
for (int i = 0; i < 10; i++) {
a[i] = i; <-- array index cannot be 4 billion in a sane application either.
};


er, why would i be 4 billion, and why would having i be -1 make a difference?

Because in the unsigned world, -1 is equal to 4294967295.

So if you had some strange code that set i to -1, it would crash whether you used a signed or unsigned int.
I would prefer unsigned in such a case. an index of 4294967295 will obviously always crash with 32-bit addressing, whereas an index of -1 might simply access a different variable on the stack/heap.
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.


My example wasn't best, the point I was trying to make is that an array index, as a value, cannot be signed. a[-1] isn't allowed, as such it makes no sense to use it. With a variable, it's possible to do it, but it's not sane. Depending on the compiler, the results might even be deterministic, but still not sane. So might as well enforce this relation.

However, unsigned types may lead to slightly more elegant code (or not, depends).

template < class T >
T get( unsigned int index )
{
if ( index < n_elements ) {
return ...
} else {
// error
}
}

compared to:
template < class T >
T get( int index )
{
if ( (index >= 0) && ( index < n_elements )) {
return ...
} else {
// error
}
}

since it allows you to establish hard relation. Even if a negative index is passed, it only requires one test.

The decrementing while statement is tricky as well, since subtle flaws can lead to, as mentioned, infinite loops.

Overall, unsigned types are just fine for things that are unsigned. But they come with their own set of problems and gotchas.

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus
a[-1] isn't allowed, as such it makes no sense to use it.


int a[2];
int *b = a+1;
b[-1] = 3;




I've actually seen this sort of thing on more than one occasion, usually in code converted from FORTRAN.

Anyways, I think we're both arguing the same side, sooo.

Share this post


Link to post
Share on other sites
All this info scares me; it looks like these int/unsigned loops are not as inherently safe as they should be. Maybe a suitable solution is to use some new types for loops that check for these conditions, like Astrachan does in some of his classes, something like:


for( isFirst; !isDone; isNext )
{
// Whatever...
}


--random

Share this post


Link to post
Share on other sites
I definitely think it is a mistake to use an unsigned variable just because the values are expected to never be negative. Two reasons:
  • It gives you a false sense of security. As pointed earlier, it doesn't catch or prevent problems, and the mistaken assumptions in some of the posts above illustrate the point.
  • Mixing signed and unsigned variables results in a lot of casting, which reduces readability and leads to bugs.

I use unsigned values for bit manipulation, for variables that need an extra bit of range, and for compatibility.

Does anyone use unsigned char if the value is expected to always fit in 8 bits? In other words, does anyone advocate doing this?
    for ( unsigned char i = 0; i < 10; ++i )
...
For the same reasons described above, I don't use char or short just because the values are expected to be in their ranges. I just use int. In fact I only use char for text and can't think of any reason for ever using short (if I need specific sizes, types such as __int8 and __int16 are more appropriate).

[Edited by - JohnBolton on September 16, 2007 2:06:00 AM]

Share this post


Link to post
Share on other sites
C++ is just an inherently unsafe language, and unbounded loops have the inherent problem (and benefit) of iterating for some unknown number of times that could depend on values being computed within the loop itself. It is up to the programmer to ensure that loops eventually terminate and at the proper time, while not performing undefined operations.

If you want to enforce really strict safety, I've read that Eiffel is designed for that sort of thing. It has ways to insert whatever kinds of checks you like.

Using an unsigned int on a 32 bit machine probably won't cause any harm since it is difficult to imagine a loop counting past two billion, and in the rare case that such a thing is necessary it is a simple change to use an unsigned int, a long, or a bigint class.

Share this post


Link to post
Share on other sites
Maybe this is a wild idea to the extreme, but what about something like:


#include <limits>
#include <stdexcept>

template<typename baseType> class iter_t
{
public:
iter_t(baseType lower, baseType upper) :
lower_(lower), upper_(upper),
min_limit_(std::numeric_limits<baseType>::min()),
max_limit_(std::numeric_limits<baseType>::max())
{
checkLower(lower);
checkUpper(upper);
}

void isFirst()
{
counter_ = lower_;
}

void isFirst(baseType lower)
{
checkLower(lower);
counter_ = lower;
}

void isNext()
{
++counter_;
}

bool isDone()
{
return (counter_ == upper_ );
}

// Could include another settable isDone(baseType) method but
// checking each loop would slow things considerably.

private:
baseType counter_;
const baseType lower_;
const baseType upper_;
const baseType min_limit_;
const baseType max_limit_;
void checkLower(baseType lower)
{
if (lower < min_limit_ )
{
// issue some error condition...
}
}
void checkUpper(baseType upper)
{
if (upper > max_limit_ )
{
// issue some error condition...
}
}
};












Used in source something like:


iter_t<int> i(0,500);

for( i.isFirst(); !i.isDone(); i.isNext() )
{
// do something...
}

for( i.isFirst(10); !i.isDone(); i.isNext() )
{
// do something...
}












This would be for determinant loops; a condition could be added for indeterminant loops as well, that accepts another form of computed baseType limit.

Edit - What is this weirdness? I've benchmarked the above code against a standard loop producing 100 million random numbers with my fastest generator, and consistently outperforms a standard loop by about 15%. What gives?

Results are something like:

// For standard 'for' loop:
Elapsed time : 0.84s. // Sometimes as high as 0.98s.

// For 'iter_t' loop:
Elapsed time : 0.73s. // Pretty much stable at 0.73s.

Comparative source code follows:


int j = 0;
double time = 0;

timer.mark(); // Simple timer class, already initialized.
for ( j=0; j!=100000000; ++j)
{
pGen->next();
}
time = timer.read();
std::cout << "\nElapsed time : " << time << "s." << std::endl;

iter_t<int> i(0,100000000);

timer.mark();
for( i.isFirst(); !i.isDone(); i.isNext() )
{
pGen->next();
}
time = timer.read();
std::cout << "\nElapsed time : " << time << "s." << std::endl;








Is this because there is something run-time 'dynamic' going on with the use of 'int', and something compile-time 'static; going on with 'iter_t'? Thinking about doing some major refactoring here....

Edit again...Revised iter_t (static invocation is consistently 0.7s for 100 million random numbers) and tried some run-time trials for comparison purposes. Code of run-time is:


unsigned lower = STUtility::promptRange("Select a lower bound within the range : ",-100,100000000);
unsigned upper = STUtility::promptRange("Select a upper bound within the range : ",-100,100000000);

iter_t<unsigned> test(lower,upper);

timer.mark();
for( test.isFirst(); !test.isDone(); test.isNext() )
{
pGen->next();
}
time = timer.read();
std::cout << "\nElapsed time : " << time << "s." << std::endl;

lower = STUtility::promptRange("Select a lower bound within the range : ",-100,100000000);
upper = STUtility::promptRange("Select a upper bound within the range : ",-100,100000000);

timer.mark();
for ( j=lower; j!=upper; ++j)
{
pGen->next();
}
time = timer.read();
std::cout << "\nElapsed time : " << time << "s." << std::endl;





Interestingly, the iter_t is still faster. Results are:

Select a lower bound within the range : -100 to 100000000 : 0
Select a upper bound within the range : -100 to 100000000 : 100000000

Elapsed time : 1.57s.

Select a lower bound within the range : -100 to 100000000 : 0
Select a upper bound within the range : -100 to 100000000 : 100000000

Elapsed time : 1.68s.

Note: Type is unsigned because I'm in the process of testing the error conditions. I wouldn't necessarily use unsigned in this context.

--random

[Edited by - random_thinker on September 16, 2007 8:56:19 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
No, that's the thing. An index of 4294967295 will access exactly the same thing as an index of -1. From the CPU's point of view, there literally is zero difference between the two situations.


Assuming a 32-bit twos complement signed integer representation. A sign magnitude integer implementation would probably do something different.

Share this post


Link to post
Share on other sites
Fascinated by this problem. To solve some of the issues pointed out in this thread, and also some buffer overflow and possible security problems, what about using an int_t type, something like:


#include <iostream>
#include <limits>
#include "../Headers/Utilities/Error.h"

template<typename intType>
class int_t
{
public:
int_t() :
value_(0)
{
}

int_t(const intType arg) :
value_(arg)
{
}

int_t(const int_t<intType>& arg) :
value_(arg.value_)
{
}

~int_t()
{
}

int_t operator=(const int_t<intType> arg)
{
value_ = arg.value_;
return *this;
}

int_t operator=(const intType arg)
{
value_ = arg;
return *this;
}

int_t operator+(const intType arg) const
{
return (int_t(value_ + arg));
}

int_t operator+(const int_t<intType> arg) const
{
return (int_t(value_ + arg.value_));
}

int_t operator-(const intType arg) const
{
return (int_t(value_ - arg));
}

int_t operator-(const int_t<intType> arg) const
{
return (int_t(value_ - arg.value_));
}

int_t operator*(const intType arg) const
{
return (int_t(value_ * arg));
}

int_t operator*(const int_t<intType> arg) const
{
return (int_t(value_ * arg.value_));
}

int_t operator/(const intType arg) const
{
if (arg == 0)
{
STUtility::error error_cond("Divide by zero");
error_cond.die();
}
return (int_t(value_ / arg));
}

int_t operator/(const int_t<intType> arg) const
{
if (arg.value_ == 0)
{
STUtility::error error_cond("Divide by zero");
error_cond.die();
}
return (int_t(value_ / arg.value_));
}

int_t operator%(const intType arg) const
{
return (int_t(value_ % arg));
}

int_t operator%(const int_t<intType> arg) const
{
return (int_t(value_ % arg.value_));
}

int_t operator&(const intType arg) const
{
return (int_t(value_ & arg));
}

int_t operator&(const int_t<intType> arg) const
{
return (int_t(value_ & arg.value_));
}

int_t operator|(const intType arg) const
{
return (int_t(value_ | arg));
}

int_t operator|(const int_t<intType> arg) const
{
return (int_t(value_ | arg.value_));
}

int_t operator^(const intType arg) const
{
return (int_t(value_ ^ arg));
}

int_t operator^(const int_t<intType> arg) const
{
return (int_t(value_ ^ arg.value_));
}

int_t operator~() const
{
return (~value_);
}

int_t operator<<(const intType arg) const
{
return (int_t(value_ << arg));
}

int_t operator<<(const int_t<intType> arg) const
{
return (int_t(value_ << arg.value_));
}

int_t operator>>(const intType arg) const
{
return (int_t(value_ >> arg));
}

int_t operator>>(const int_t<intType> arg) const
{
return (int_t(value_ >> arg.value_));
}

bool operator==(const intType arg) const
{
return (int_t(value_ == arg));
}

bool operator==(const int_t<intType> arg) const
{
return (int_t(value_ == arg.value_));
}

bool operator!=(const intType arg) const
{
return (int_t(value_ != arg));
}

bool operator!=(const int_t<intType> arg) const
{
return (int_t(value_ != arg.value_));
}

bool operator>=(const intType arg) const
{
return (int_t(value_ >= arg));
}

bool operator>=(const int_t<intType> arg) const
{
return (int_t(value_ >= arg.value_));
}

bool operator>(const intType arg) const
{
return (int_t(value_ > arg));
}

bool operator>(const int_t<intType> arg) const
{
return (int_t(value_ > arg.value_));
}

bool operator<=(const intType arg) const
{
return (int_t(value_ <= arg));
}

bool operator<=(const int_t<intType> arg) const
{
return (int_t(value_ <= arg.value_));
}

bool operator<(const intType arg) const
{
return (int_t(value_ < arg));
}

bool operator<(const int_t<intType> arg) const
{
return (int_t(value_ < arg.value_));
}

int_t operator++()
{
++value_;
return (*this);
}

int_t operator--()
{
--value_;
return (*this);
}

intType value()
{
return value_;
}

private:
intType value_;
};

template<typename intType>
std::ostream & operator<<(std::ostream & out, int_t<intType> arg)
{
return out << arg.value();
}

template<typename intType>
std::istream & operator>>(std::istream & in, int_t<intType> arg)
{
return in >> arg.value();
}



--random

Share this post


Link to post
Share on other sites
Quote:
Original post by JohnBolton
I definitely think it is a mistake to use an unsigned variable just because the values are expected to never be negative. Two reasons:
  • It gives you a false sense of security. As pointed earlier, it doesn't catch or prevent problems, and the mistaken assumptions in some of the posts above illustrate the point.

I only use unsigned integers to indicate an intention rather than to catch mistakes; I certainly don't feel any safer using them.

Quote:
  • Mixing signed and unsigned variables results in a lot of casting, which reduces readability and leads to bugs.

If used consistently and at appropriate circumstances then there's little reason that casting for signedness within your own code need occur. If you're using a library then there is always the risk that you have a different convention on the use of primitive types. The library I use most often is the SC++L; which consistently defines size_type to be an unsigned integer so I rarely need to cast for signedness at all.

Quote:
I use unsigned values for bit manipulation, for variables that need an extra bit of range, and for compatibility.

I use signed values when I need negative numbers or for compatibility.

Quote:
Does anyone use unsigned char if the value is expected to always fit in 8 bits? In other words, does anyone advocate doing this?
    for ( unsigned char i = 0; i < 10; ++i )
...

This is a very good point.
I wouldn't advocate this; I would wager most people associate char with characters, strings and bytes; as a result I'd only use a char when dealing with one of those three things.
Even though using a char may be more explicit about the range it would obfuscate the intended purpose of the variable if just used for natural numbers in this way.

Quote:
For the same reasons described above, I don't use char or short just because the values are expected to be in their ranges. I just use int. In fact I only use char for text and can't think of any reason for ever using short (if I need specific sizes, types such as __int8 and __int16 are more appropriate).

Personally I think that using either a signed or unsigned integer is fine provided it's consistent. Relying on unsigned values to catch bugs for you is risky and as shown in the posts above often mistaken. As long as types are used appropriately I have no problem reading or even writing code using either convention (but if left to my own devices I would default to using unsigned integers for non-negative values).
I wasn't trying to trash the use of signed integers btw, I just want to provide a balanced argument :-)

As for short there really is little reason to use it by itself, if you have a lot of integers packed together then using short could half the memory requirement, the first example that springs to mind is an index buffer/array.

Share this post


Link to post
Share on other sites

This topic is 3743 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this